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Abstract 


Ever  increasing  operational  and  technical  requirements  have  led  to  highly  integrated  flight,  guidance  and  control,  and  weapons 
delivery  systems.  The  effective  implementation  of  these  functions  makes  the  fusion  and  interpretation  of  sensor  data  and  the 
multifunctional  use  of  sensor  information  inevitable.  ■ 
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Neural  networks,  consisting  of  parallel  microcomputing  elements,  hold  great  promise  for  guidance,  navigation  and  control 
applications  because  of  their  ability  to  learn  and  acquire  knowledge.  , 

c - * - -  - - ' 

The  Lecture  Senes  will  bring  together  a  group  of  NATO  nation  speakers  with  outstanding  experience  in  this  new  area  of 
technology.  First  they  will  review  the  fundamentals  of  neural  networks  to  serve  as  background  so  that  advances  in  this  new, 
rapidly  evolving  technological  area  can  be  both  understood  and  appreciated.  They  will  then  discuss  a  number  of  related 
applications  of  direct  benefit  to  the  attendees.  j _ _ _  _ 

This  Lecture  Series,  sponsored  by  the  Guidance  and  Control  Panel  of  AGARD,  has  been  implemented  by  the  Consultant  and 
Exchange  Programme. 


Abrege 


Les  exigences  techniques  et  operationnelles  toujours  plus  nombreuses  ont  amene  des  systemes  de  commandes  de  vol,  de 
guidage  et  de  pilotage  et  de  lancement  d’engins  fortement  integres.  La  mise  en  oeuvre  effective  de  ces  systemes  passe 
inevitablement  par  le  fusionnement  et  le  depouillement  des  donnees  des  capteurs. 

Les  rese„  jx  neuroniques,  qui  consistent  en  des  elements  micro-informatiques  mise  en  parallele,  sont  tres  prometteurs  pour  des 
applications  dans  le  domaine  du  guidage,  du  pilotage  et  de  la  navigation,  en  raison  de  leur  capacite  d’apprentissage. 

Ce  cycle  de  conferences  rassemble  un  groupe  de  conferenciers  des  pays  membres  de  I’OTAN  ayant  une  experience 
exceptionnelle  dans  ce  nouveau  domaine  technologique.  Les  aspects  fondamentaux  des  rweaux  neuroniques  seront  abordes 
dans  un  premier  temps  pour  permettre  une  estimation  des  progres  realises  dans  ce  domaine  en  pleine  expansion.  Un  certain 
nombre  d'applications  connexes,  d’un  interet  particulier  pour  les  participants,  seront  ensuite  discutes. 

Ce  cycle  de  conferences  est  presente  dans  le  cadre  du  programme  des  Consultants  et  des  Echanges,  sous  I’egide  du  Panel 
AGARD  du  Guidage  et  du  Pilotage. 
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INTRODUCTION 


AlbertJ.  Shapiro 

GEC'Marconi  Electronic  Systems  Corp. 


INTRODUCTION 

The  objective  of  this  Lecture  Series  is  to  present  both  the  fundamentals  of  neural  networks  and  a 
number  of  related  Guidance,  Navigation  and  Control  (GNC)  applications.  The  lecturers  come  from 
several  of  the  participating  AGARD  countries,  specifically  Canada,  France,  Germany,  the  United 
Kingdom,  and  the  United  States.  We  will  have  nine  lectures  and  conclude  with  a  round-table 
discussion  involving  all  participants. 

PERSPECTIVE 

The  digital  computer  has  impacted  GNC  from  two  aspects.  It  makes  possible  the  implementation  of 
large  embedded  systems  utilizing  complex  algorithms  and  control  logic.  It  has  also  become  the 
primary  engineering  tool  which  makes  possible  the  design  and  analysis  of  such  systems. 

Advances  in  computational  speed  have  enabled  the  real-time  implementation  of  algorithm-intensive  GNC 
solutions  for  both  aircraft  and  missiles.  Application  of  Kalman  filtering  in  hybrid-inertial 
navigation  and  optimized  flight  control  applications  in  airframes  with  complex  flexure  patterns  are 
examples  of  practically  successful  design  and  hardware/software  integration.  Kalman  filtering 
allows  for  less  precise  sensors  to  be  synergistically  integrated  through  software  to  provide 
improved  overall  system  performance. 

Airborne  missions  have  become  more  complex  and  stressful  to  the  pilot.  Scenarios  now  require  threat 
avoidance,  rapid  replanning  and  reconfiguration  of  navigation  modes  in  the  presence  of  jamming  of 
navigation  aids  such  as  GPS,  emission  management  in  heavily  defended  areas,  and  continuous 
evaluation  of  avionics  system  status  in  terms  of  fault  detection  and  isolation  and  fault  tolerant 
reconfiguration  The  need  for  reducing  the  pilot’s  workload  through  relegation  of  more  diagnostic 
and  decision-  making  functions  to  the  computer  has  become  a  necessity. 

The  application  of  Artificial  Intelligence  or  Expert  Systems  these  applications  is  a  significant 
step  in  this  direction.  In  conventional  problem-solving,  deterministic  responses  are  produced  for 
anticipated  circumstances  but  unanticipated  situations  cannot  be  properly  processed.  On  the  other 
hand,  the  Expert  System  approach  has  additional  information  built  into  its  Knowledge  Base, 
approximating  the  resources  of  a  skilled  problem  solver.  Tho  Inference  Engine  provides  the 
mechanism  to  attack  the  problem  with  these  resources. 

However,to  quote  Dr.  Bowen  from  one  of  his  previous  AGARD-  sponsored  lectures,  'An  expert  system, 
nonetheless,  is  quite  similar  to  a  real-time  control  system;  for  example,  both  are  command  and  event 
driven,  have  feedback  loops,  require  the  same  instrumentation  packages,  and  access  the  same  kind  of 
data  from  conventional  data  bases.' 

Compare  this  to  the  prospect  of  the  machine  duplication  of  functions  of  the  human  brain  in  which, 
somehow,  a  natural  network  of  neurons,  composed  of  interconnected  living  nerve  cells,  thinks,  feels, 
learns  and  remembers. 
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Scientists  and  engineers  are  primarily  interested  in  models  inspired  by  bra'r.  function  and  not 
necessarily  the  achievement  of  biological  fidelity.  The  objectives  of  engineering  research  in 
artificial  neural  networks  (ANNs)  are  to  understand  how  the  brain's  'computations'  are  organized  and 
carried  out  and  then  to  understand  the  class  of  neural  network  models  that  replicate  this 
■computational  power*. 

The  increasing  interest  in  ANNs  has  been  aided  by  both  technological  advances  as  well  a  deeper 
understanding  of  how  the  brain  works.  A  driving  force  is  the  need  for  a  new  breed  of  powerful 
computers  to  solve  a  variety  of  problems  that  are  proving  to  be  very  difficult  for  conventional 
digital  computers  Cognitive  tasks  such  as  pattern  recognition  under  real-world  conditions,  pattern 
matching,  and  combin&itorial  optimization  are  some  exampies.  Tasks  such  as  recognizing  a  familiar 
face,  learning  to  speak  and  understand  a  natural  language,  and  retrieving  contextually  appropriate 
information  from  memory  are  typically  performed  naturally  by  the  brain,  but  are  beyond  the  reach  of 
conventionally  programmed  computers  as  well  as  the  rule-based  expert  systems. 

Neurocomputing,  that  is,  nonprogrammmed  adaptive  information  processing  systems  -  artificial  neural 
networks-  is  a  fundamentally  new  and  different  information-processing  paradigm  -  the  first 
alternative  to  algorithmic  programming.  It  holds  the  potential  for  significant  breakthroughs  in  the 
field  of  GNC  -  systems  which  can  learn  and  rapidly  accommodate  to  a  wide  variety  of  internal  and 
external  stimuli  occurring  in  nonpredetermined  combinations.  For  example,  rapid  reaction  to 
unforeseen  combinations/types  of  threats  and  aerodynamic  changes,  and  autonomous  vehicles  capable  of 
self  guidance  are  but  examples  of  such  leaps  in  capabiiity. 

With  the  foregoing  in  mind,  this  Lecture  Series  has  two  major  themes;  a  tutoriai  introduction  to 
ANNs  and  applications  of  the  overall  technology  of  ANN  to  the  Guidance  and  Control  field. 

I  hope  that  these  papers  will  be  as  informative  to  you  as  I  am  sure  they  will  be  to  me. 


REFERENCES 

1 .  Quinlaven,  R.  P.,  'Knowledge-Based  Conceots  And  Artificial  Intelligence  Applications  To 
Guidance  And  Control*,  AGARD  Lecture  Series  No.  155. 

2.  Bowen,  B.  A.,  'Real  Time  Expert  Systems:  A  Status  Reporr,  AGARD  Lecture  Series  No.  1 55. 

3.  Vemur,  V.,  'Artificial  Neural  Networks:  Theoretical  Concepts',  The  Computer  Society  Of  The 
IEEE. 
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INTRODUCTION  TO  NEURAL  COMPUTING  AND  CATEGORIES  OF 
NEURAL  NETWORK  APPUCATIONS  TO  GUIDANCE,  NAVIGATION  AND  CONTROL 

by 

Uw«  K  Krogmann 

BodensMwark  QariMechnIk  QmbH 
lnt«lllg«nt  Sytlamt  OMtlon 
NuMdorfar  Sir.  -  D-7770  ObMilngan 
FRQ 


1.  Introduction 


'Future  computer  generation  Imitatea  man'.  'Many  tmall  eelit  are  stronger  than  one  large  celll*.  Such  headlines  are  to  be  found  In 
the  media  In  connection  with  a  new  kind  of  Information  prooesNng,  the  so-called  'Artificial  Neural  Networks  (ANN)'.  As  the  term  suggests, 
these  networks  are  an  attempt  to  Imitate  the  biological  paradigm,  our  brNn,  In  structure  and  function. 


In  the  course  of  evolution  our  central  nervous  system  ^aln  and  spinal  cord)  has  developed  Into  a  gigantic  Information-  processing 
network  to  which  the  sensory  paths  from  sense  organs  lead  and  from  which  the  motor  paths  lead  to  the  muscles.  All  stimuli  are  supplied  to  the 
central  nervous  system  where  they  are  processed  Into  perceptions,  sensations  etc.  and  trigger  off  our  actions. 


In  our  organism  many  organ  systems  work  together.  Only  the  central  narvous  system  communicates  as  superior  system  with  all 
others  by  collecting  their  Information  and  coordinating  their  functions. 


Basically  similar  problems  will  be  found  In  future  technical  equipment  and  systems.  Based  on  the  structure  of  the  biological  brain, 
the  creation  of  artificial  neural  networks  (abbreviated  ANN)  Is  aimed  to  technically  realize  capabilities  and  characteristics  such  as  self¬ 
organisation,  learning  and  associative  memory.  This  Is  achieved  by  the  particular  structure  of  neural  networks  where  a  large  number  of  simple 
processor  elements  (PE)  are  Interconnected  vrlth  unl-directional  signal  channels  to  single-  or  multi-layer  networks.  All  processing  elements  are 
working  In  parallel  as  compared  to  one  central,  extremely  efficient  computer  for  sequential  arithmetic  and/or  symbolio  Information  processing. 


For  the  solution  of  a  problem  with  a  conventional  computer  (e.g.  personal  computer  (PC))  an  algorithm,  a  procedure  or  a  sat  of 
rules  has  to  be  developed  and  coded  In  software,  I.e.  a  sequence  of  instructions.  These  Instructions  are  then  carried  out  sequentially  by  the 
computer. 


By  contrast,  ANNs  are  not  programmed  but  trained  and  learn  like  their  biological  paradigm,  the  brain.  This  Is  done  by  changing 
the  Intensity  of  the  connections  between  the  processor  elementa  and  by  generating  or  eliminating  structural  connections.  Thus  the 
'knowledge*  of  an  ANN  lies  In  the  topology  and  In  the  Intensity  of  Hs  connections,  I.e.  the  strength  of  the  connection  weights  between  the  PEs. 


With  their  capabilities  of  seH-organIsatiot ,  learning  (adaptation)  and  asaodation,  ANNs  can  be  used  wherever  it  Is  difficult  to 
describe  a  problem  algorithmically,  the  development  of  the  operational  software  Is  very  cost-intensive  or  wherever  unprecise,  incomplete  or 
even  contradictory  Input  data  must  be  considered.  Owing  to  the  parallel  information  processing  ANN  are  fault  tolerant  and  thus  very  reliable. 


Ever-Increasing  requirements  placed  on  more  demarKfIng  and  complex  systems  on  the  one  hand  and  financial  resources  getting 
increasingly  scarce  on  the  other  force  us  to  filter  out  key  (•.chndlogies  showing  the  potential  for  a  high  cost-h«nefit  ratio  to  meet  the  Increased 
requirements.  In  this  respect  Artificial  Neural  Networks  i  jpreeent  a  new  technology  In  the  field  of  signal  and  Information  processing  for 
Quidanoe  and  Control  systems.  This  article  Is  Intended  to  give  a  short  Introduction  Into  ANN  and  their  appileat'on  In  guidance  and  control. 


2.  General  Structure  of  GuManoe  and  CofiMProMoms 


Q.a.C.  problems  extend  over  several  hlerarchicelly  structured  levels  and  the  communication  functions  between  these  levels  as 
shown  In  Fig.  1.  The  represented  Interconnection  of  the  different  function  levels  (scenario,  mission,  trajectory,  air  vehicle  state)  can  be 
conceived  of  as  a  hierarchically  stnretursd  control  system.  The  objects  on  which  Q.a.C.  functions  are  performed  on  the  mentioned  levels 
represent  the  oontrol  plants.  Information  processing  by  whith  actuation  Is  generated  on  all  levels  from  tensor  information  represents  the 
controller  which  Is  of  primary  concern  here  (Fig.  2). 


The  controlling  feedback  chain  typical  of  all  Qji.C.  levels  requires  functions  such  at  recognizing  and  assessing  the  situation: 
defining  action  goals;  generating  optimum  or  favorable  solutiont;  deoitiotwnaking;  planning  and  finally  performing  as  well  as  monitoring  of 
actions.  Hence,  behavior  levels  of  mental  capabllltlot  can  be  assigned  to  the  function  levels  (Rg.  1). 
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FIGURE  1 


COMMON  BASIC  STRUCTURE  OF  G.a.C.  LEVELS 


CONTROLLER/INFORMATION  PROCESSING 


FIGURE  2 


Fof  reMont  of  human  IlmItatlona  In  moro  damanding  dynamic  aoanarloa  and  In  the  operation  of  oomplax,  highly  Intagraiad 
systema,  thara  la  tha  nacaaaity  for  axtandad  automation  of  thaaa  funoliona  on  higher  lavala  auch  aa  trajectory  control  aa  wall  aa  mlaaion 
management  and  control.  Furthermore,  the  Implementation  of  Intelligent  funotlona  on  lower  levela  auch  aa  the  fualon  and  Interpretation  of 
aenaor  dau,  multifunctional  uae  of  aenaor  Information  and  emart/brittiant  aenaora  become  Inevitabte. 


The  technical  Implementation  of  the  Intelligent  Q.a.C.  feedback  chain  functlona  leada  to  a  aignal  proceaaing  atrueture  which 
containa  conventional  arithmetic,  aymbdlc  and  aub-eymbollc  elementa  (Rg.  3).  Wheraaa  the  aymbollo  element  oan  be  Implemanted  utilizing 
expert-ayatem  aoftwara  techniquea,  the  aubeymbolio  elemant  repreaenta  tha  application  of  ANNa.  In  building  ANNa  the  brain  la  utilized  u 
biological  paradigm.  In  tha  following  Ha  function  and  atrueture  are  to  be  briefly  explained  aa  far  aa  thia  la  Important  for  undaratanding  ANNa. 
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Activity  Chain  Implementation  Elements 


FIGURE  3 


3.  The  biological  brain  at  paradigm 


Function  and  etructure 


Two  dllferent  functions  of  the  brain  are  to  be  looked  at.  First,  there  Is  the  rational  thinking  with  a  function  In  conscious  steps 
performed  In  a  particular  serial  sequence.  The  digital  computers  we  use  today  with  a  sequential  processing  of  Instnictions  listed  In  programs 
(computers  in  so-called  von-Neumann  archHedure)  were  developed  In  the  1040s  based  on  the  Investigation  of  sequentially  conscious 
thinking. 


On  the  other  hand,  there  are  the  much  more  complex  structures  o'  unconscious  thinking  or  unconscious  Intelligence.  Here,  a  lot  of 
environment  data  fi^e  processed  within  the  context  of  our  sensory  peroeptlo.i  and  characteristics  extracted.  The  sensorimotor  conbol  of  our 
motions  as  wall  as  three-dimsnslonal  thinking  are  largely  unconscious.  Tie  structures  of  unconscious  thinking  provida  the  basis  for  the 
enormous  capacity  of  our  memory.  All  of  these  funcdona  performed  unoor.seioiMly  are  running  parallel  In  networks  in  which  so-called  neurons 
Interact  due  to  a  close  interconnection  and  by  means  of  electrochemical  prooesses. 


Our  brain  Is  organized  as  highly  Integrated  system  in  functional  units,  which  ars  Interconnscted  via  variable  connections,  with  each 
functional  unit  having  about  one  thousand  to  one  hundred  thousand  nerve  oells.  These  each  have  ten  to  ten  thousand  equally  variable,  so- 
called  synaptic  connections  to  other  neurons.  In  total,  our  central  nervous  system  roughly  contains  ths  astronomical  number  of  one  hundred  to 
one  thousand  billion  nerve  cells.  K  Is  clear  that  this  enormous  information-prooesslng  system  cannot  be  completely  structured  and 
programmed  prenatally  even  if  genetic  information  Is  taken  into  account.  The  brain  has  the  capability  to  organize  Ks^,  learn  and  establish 
associations. 


To  imitate  biological  Information  processing  models  for  different  levels  of  organisation  and  of  abstraction  have  to  be  considered. 
First,  there  is  the  level  of  the  Individual  neuron  where  It  Is  a  matter  of  representing  tha  sUtIo  and  dynamic  slectrlcal  characteristics  as  v.elt  as 
the  adaptive  behavior  of  the  neuron.  On  the  network  level  the  interoonnectlon  of  Identloal  neurons  to  form  networks  Is  examined  to  describe 
specific  sensor-  and  motoricity-related  functions  such  as  filtering,  projection  operations,  controller  functions  In  nonlinear,  bldogloal  systems. 
Networks  on  Lhe  mental  function  level  are  ths  most  compllcatsd  ones  and  oompriss  functions  such  as  perception,  solution  of  problems, 
strategic  proceeding  etc.  These  are  the  networks  on  the  highest  level  of  biological  information  prooasalng. 


The  Neuron 


The  nerve  cell  ^he  neuron)  comprises  the  cell  body  (soma)  which  surrounds  the  cell  nucleus  (Fig.  4).  The  call  body  has  a  long 
processus,  the  axon  (or  neurits)  which  ends  In  numerous  ramifioatioos  whioh  are  attached  to  other  cells  vfo  so«alled  synaptio  end  heads  thus 
forming  ths  synaptio  connections  between  neurons.  The  synaptio  oonneollon  Is  to  points  where  the  oell  body  Is  expandsd  to  so-ealled 
dsndritss. 
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In  the  statlonwy  •(•ctro-chemical  ttat*  th«  mH  hM  a  ratting  potential  of  about-  80  mV.  If  a  narva  oaU  It  tUmulated  by  anothar  call 
via  tha  tynaptic  oonnaction,  a  ahort-tima  pola  ravartal  of  about  30  mllllvoitt  with  a  duration  of  1  mlllitaoond  raauht  (Fig.  5).  Tbit  to«allad 
action  or  activation  potential  movat  with  up  to  100  iTMten  par  aaoorKi  aoroaa  tha  axon  to  nalghboring  oaHt.  Tha  tUmulut  mutt  axcaad  a 
tpacific  thrathold  to  that  thit  action  potenM  It  ganaratad.  Tha  action  potential  Imnwdlataly  dropt  after  Ha  rita  and  tha  call  ratumt  to  tha 
ratting  ttete. 


Tha  dagrta  of  call  ttimulatlon,  i.a.  tha  danaity  of  Information  It  datermlnad  by  tha  fraquanoy  of  tha  action  potentlalt.  Tha  graater 
tha  ttimulut,  tha  highar  tha  taquanoa  of  Impulaat  on  tha  axon. 


SIMPLIFIED  BIOLOGICAL  NEURON 

HGURE  4 


What  It  vary  important  for  tha  teaming  and  adaptiva  capabilittet  of  biological  naural  natworkt  It  tha  to-callad  plaaticity 
charactarittic  of  tha  tyntptat.  This  chartctarittic  givat  tha  nauront  a  mamory  tuch  that  thair  raaction  to  an  Impulaa  raoaivad  dapandt  on  thair 
past  hittory,  I.a.,  for  Inttanca,  how  many  Impulaaa  htva  alraady  baan  trantmittad  by  It  bafora  and  In  which  aaquanca.  In  thIt  procass,  tha  patt 
hittory  It  takan  into  acoount  ovar  minutea,  houra,  yaa  avan  ovar  much  longer  parlodc  of  tima  (long-tenn  mamory). 


Apart  from  tha  tUmulatlng  nauront  thara  ara  aito  InhlbHiva  nauront.  Thata  product  trantmitteit  which  Incraata  tha  nagathra 
charge  In  tha  Interior  and  thut  tha  ratting  potential  of  tha  raoaMng  call.  Thata  InhlbHiva  nauront  can  blank  out  action  potentiait  of  stimulating 
nauront.  which  ara  transmitting  timuHanaoutly,  In  tha  Joint  receiving  neuron'.  Hanoa,  all  potentials  raoaivad  via  tynaptic  connections  are 
added  on  tha  racaivlng  neuron;  those  from  stimulating  nauront  with  a  plua  sign  and  those  from  InhlbHiva  nauront  wHh  a  minus  sign.  Tha  sum 
of  all  Inputs  triggers  tha  neuron  activation  via  a  nonlinear  activation  function. 


EXCITATION  PATTERN  IN  RESPONSE  OF  A 
STIMULUS  (ACTIVATION  POTENTIAL) 


FIGURE  5 


What  It  ramarkabla  In  this  connection; 

Today's  digHal  computers  have  cyda  timat  (tima  for  procatting  a  partial  Information)  of  4  to  5  nanotacondt.  Tha  comparable  cycle 
time  of  a  neuron  (liiTW  for  procatting  a  ttimulut  up  to  ratdinatt  for  receiving  a  new  tUmulut)  it  4  to  5  mlllitaoondt.  Thut  tha  digital  computer 
it  a  million  Hmat  fatter  than  tha  neuron.  OatpHa  thit  enormous  diffaranoa  In  tha  tima  for  procatting  a  pteoa  of  Information  and  for  raacting  to 
a  ttimulut  naural  natworkt  ara  in  inany  applloationa  tuparlor  to  digHal  oomputert  wHh  aaquantial  procatting  of  often  axtentiva  programs 
regarding  tha  execution  time  due  to  thair  parallel  Information  procatting. 
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4.  TIm  Artificial  Neuron 


On  the  batis  of  the  biological  neuron  a  simplifiod  modal  of  tha  artificial  neuron  It  shown  In  Bg.  6.  According  to  this  figure,  the 
neuron  acts  like  an  Integrator  with  feedback  which  integrates  the  weighted  presynaptlc  Input  signals  and  maps  the  postsynaptlc  activation  x, 
onto  the  output  signal  by  means  of  one  of  the  nonlinear  functiont  shown  as  examples.  A  threshold  value  b.  Is  also  taken  into  account.  The 
differential  aquation  for  the  activation  x.  of  the  j-th  neuron  taking  into  consideration  the  learned  oonnecting'welglits  w,.  describes  the  short- 
temt  behavior  of  the  neuron  (short-term  Wmory,  STM)  as  a  function  of  the  Input  signals  Sj;  I  •>  1,2...n.  In  this  equation  tfia  Inhibitory  Inputs  are 
taken  Into  account  by  the  term  hj  (see  fig.  6).  Because  activation  Is  a  nonnegative  entity  the  condition  Xj  ^  ■  0  must  be  Imposed. 

The  ability  to  learn  and  memorixe  something  owing  to  the  plasticitiy  property  of  the  blologie  neuron  Is  In  the  case  of  tha  artificial 
neuron  obtained  by  adaptation  of  the  connecting  weights.  As  a  consequence,  the  short-term  behavior  of  an  element  Is  made  dependent  on  its 
case-history  (long-term  memory,  LTM). 


MATH.  MODEL  OF  THE  NEURON 


-B(xi,.[^^w,^].Cnj  ] 


=iT_(f)  ‘f  (  s,  j.j.wji.  ) 


FIGURE  6 


The  differential  equation  for  the  adaptation  of  the  connecting  weights  as  shown  in  Bg.  6  describes  the  dynamic^  of  learning  as  a 
function  of  the  instantaneous  values  of  the  connecting  weights  (input  weights),  the  activation  and  the  Input  quantity.  For  controlling  tha 
learning  speed  the  function  tiif)  Is  also  Introduced.  The  connecting  weight  leading  from  the  Input  I  to  the  J-th  neuron  Is  called  Wj|. 


Depending  on  the  particular  form  of  the  x.-  and  W|.-equations  there  are  different  neural  and  network  models  (paradigms).  A 
partitioning  Into  excitatory  and  Inhibitory  Inout  signals  Is,  however,  not  necessarily  required  If  tha  former  are  considered  to  be  positive  and  the 
latter  negative  Input  signals  and  the  activation  X|  can  also  assume  negative  values. 

A  simplification  of  the  neural  model  according  to  Bg.  6  considers  the  stationary  activation  status  and  Is  shown  In  Fig.  7.  It  was 
Introduced  by  McCulloch  and  PHts  (1943).  The  resulting  output  signal  is 


sj.f(^s,wj,-bj)  (1) 

I 


Instead  of  substracUng  the  threshold  b.  from  the  sum  of  the  weighted  Input  signals,  tt  can  be  Interpreted  as  an  additional  weight 
Wj^  with  a  constant  input  '1'  such  that  the  activatioil  equation  becomes 

I 


Based  on  this  equation  the  artificial  neuron  can  be  represented  as  a  basio  processor  element  (PE)  as  shown  In  Bg.  8.  It  Is 

rernarkabte  that  the  summation  .of  tha  weighted  kiput  signals  to  mathornalloaliy  Idenlioal  wHh  the  scalar  praduot  of  the  Input  and  weight  vootor. 

Gtoomatrically  It  to  thus  a  measure  for  the  oorrsMion  between  the  Input  vector  |  and  the  Instantaneous  weight  vsetor  of  the  )-th  PE,  as 
shown  In  thefoltowing  equation: 
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•iWji II I  ir  II II  .coift.a^  (3) 
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Th«r«for*  th*  PE  can  b«  imaglnad  to  perform  a  vector  pattern  matching  operation.  Equation  (3)  can  be  looked  at  as  the 
fundamental  equation  of  all  adaptive  networks. 


MC  CULLOCH-PITS  NEURON 


FIGURE  7 


ARTIFICIAL  NEURON 


The  PE  ac«ordlng  to  Rg.  8  hat  two  properties  representing  Important  preconditions  for  the  arrangement  in  parallel  network 
structures;  Only  local  inputs  and  local  Wy  memory  (l  a.  no  other  information  from  a  network  is  required);  only  one  output  signal  Is  produced 
which  It  propagated  to  other  PEs  or  represents  an  output  element  of  a  network. 


Possibly  there  are  a  large  number  of  potential  nonlinear  output  functions  (threshold  functions)  The  signum  and  sigmoid  function, 
hyperbolic  tangent,  binary  step  function,  saturated  ramp  function  are  employed  by  the  majority  of  PE  types. 


From  a  neurophysiological  point  of  view  the  artificial  neuron  as  described  here  is  too  simple.  For  the  designer  of  advanced 
adaptive  systems  this  Is,  however,  no  restriction.  Utilizing  Important  characteristio  features  of  the  biological  paradigm  like  Imerconnection  of  a 
large  number  of  simple  PEs  with  parallel  prooessing  and  adaptible  connection  weights  as  wall  as  nonllnsar  output  functions  yields  the 
possibility  for  the  design  of  procsssing  units  with  unprecedented  capabilities. 


8.  Adapatlve  Processing  Elements 


and 


The  Adaptive  Unear  Combiner  (ALC)  Is  known  from  adaptive  signal  prooessing.  In  Fig.  9  Is  Sy  -  *he  input  vector 

(Wok  *'^1, ’'*'11,] '  IP*  weight  vector  of  the  ALC  at  tinre  ty.  The  output  quantity  as  the  sum  of  the  ^wlghtsdinput  quwtitles  is 


L 

yk-^'*'lk’‘lk"<«k-«k>-«k\ 

tiO 


(4) 


The  same  relationship  applies  for  a  Finite  Impulse  Response  (FIR)  llltsrH  2,^  -  (x^x,,  ....x,,,.  ,]^lssetthere,x.  .withi  -  1,2...(l-1) 

being  the  filter  Input  quantities  delayed  by  one  dock  cycia  each  (delay  operator  z'V 


The  weights  are  adapted  by  means  of  the  LMS  (least  mean  squares)  algorithm  which  minimizes  the  square  of  the  deviation  of  the 
output  quantity  y^  from  the  desired  output  quantity  dy  which  Is  oonsidsrsd  to  be  known.  For  this  deviation  and  Its  square  the  following 
equations  apply; 


•k“'*k-»k\  (5) 

V"V-“k»k\*«k\8k\  w 
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ADAPTIVE  LINEAR  COMBINER 

Xok  — 
xik  — 

Xu 


FIGURE  9 


FIR-FILTER 


Ai  aach  Iteration  in  the  adaptive  procaaa  the  gradient  eatlmate  beoomea 

With  this  simple  gradient  estimate  the  LMS  algorithm  is  of  steepest  descent  type  by  updating  the  weight  vector  according  to 

!i!K  +  i=!!ik-/*l-!!!k  +  2/i(d^.y^).S^  (8) 

The  steepest  descent  step  size  parameter/z regulates  speed  and  stability  of  adaptation.  Adaptive  signal  processing  based  on  the 
ALC  with  LMS  adaptation  has  been  successfully  applied  In  systems  Identification,  adaptive  noise  canoelling,  adaptive  prediction  and  others. 

The  Adaptive  Linear  Neuron  (AOALINE)  as  the  simplest  nonlinear  processing  element  is  closely  related  to  the  ALC.  As  shown  In 
Rg.  10  it  utilizes  the  signum  output  function.  Since  the  output  signal  from  the  summation  Is  used  for  the  error  determination  needed  for 
adaptation  of  the  input  weights,  the  LMS  algorithm  can  be  used  here,  as  well. 


The  structure  of  the  perception  which  Is  also  shown  In  Fig.  10  is  Identical  with  that  of  the  ADALINE.  The  only  difference  Is  that  the 
PERCEPTRON  convergence  algorithm  uses  the  output  signal  S|^  for  error  reoognitlon  for  the  weight  adaptation.  In  both  cases  'n  Is  introduced 
to  control  the  adaptation/leaming  rate. 


Because  of  the  nonlinear  output  function  the  AOALINE  and  PERCEPTRON  become  capable  of  Input  signal  classification.  They  are 
capable  to  recognize  whether  a  particular  input  pattern  belongs  to  a  corresponding  date  or  not.  The  dassifying  ability  of  the  PERCEPTRON  Is 
illustrated  in  Rg.  1 1.  For  this  purpose  a  simple  element  with  2  inputs  and  1  o>jiput  Is  investigated.  Assuming  w.  •  const,  after  completion  of 
learning.  The  classification  equation  In  tt>is  case  describes  a  straight  line  In  tfie  input  signal  plane  (S^  •  Sj  plane).  This  line  separates  the  two 
classes.  If  the  PERCEPTRON  has  mote  than  two  Inputs  the  straight  separating  line  changes  Into  a  plan  (3  Inputs)  or  to  a  hyperplane 
(>  3  inputs).  The  PERCEPTRON  adaptation  algorithm  converges  when  dasaes  can  be  separated  linearly.  In  practice,  this  Is  frequently  not  the 
case  or  not  known  a  priori.  Then,  the  anangement  of  simple  processor  elements  (e.g.  PERCEPTRON)  In  multi-layer  networks  is  required. 

Thus,  the  transition  from  the  individual  adaptive  element  to  the  arrangement  of  such  elements  to  form  artificial  neural  networks 
becomes  necessary. 


ADALINE 


LMS  ADAPTATION  AIGOHITHM 
aw  ,,,, -tjtd,  ■»,  )  Sn 


PERCEPTRON  CONVERGENCE  ALGORITHM 
QW^,,,  .r^(d,  -  It  )  ■ 


FIGURE  10 
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PERCEPTRON  CLASSIFICATION  PROCESS 


INPUT  PATTERN  E  (CLASS  A)  IF  S(x)  =  1 
INPUT  PATTERN  €  (CLASS  B)  IF  S(x)  =  -  1 


FIGURE  11 


6.  Artiflclal  Nwiral  N«U  (ANN) 


□•flnitlon 


According  to  a  definition  used  In  llteratura,  tha  tarm  artificial  naural  nat  (ANN)  maana  maaaivaly  parallel  connected  arrangements 
of  simple  elements  adaptive  In  general  (but  not  neoesaarily)  with  hierarchically  organised  structure  from  which  H  Is  expacted  that  they  Interact 
with  the  objects  of  the  real  world  In  a  similar  way  as  the  networks  In  biological  systems  do.  ANN  are  accordingly  structured  from  groups  of 
simpis  processor  elements  that  are  arranged  In  layers  (Fig.  12).  Each  layer  comprises  a  certain  quantity  of  PEs  that  communicate  via 
connections  with  different  adaptable  weights.  There  are  Intra-  and  Intor-iayer  connections. 

For  the  structure  of  ANN,  three  basic  elements  are  Important  organized  topology  of  Interconnected  cells  ,PEs),  method  of 
encoding  (learning)  information,  method  of  recalling  Information.  They  are  dealt  with  briefly  In  the  following. 

Artificial  Neural  Network 


NETWORK  PROCESSING  ELEMENT 


FIGURE  12 
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ANN,  a  nonUnaar  ayatam 

Tha  ANN  dynainioa  can  ba  daaeribad  by  a  tat  of  nonlinaar  dlffarantial  aqualiont  for  an  autonomoua  ditalpativo  dynamic  tyatam. 


0^1 1  *2 . 


(9) 


with  Xj  at  raal  varlabla,  l.a.  nauron  activation.  Sinoa  ifM  fj  do  not  axpllcMy  dapand  on  tima,  tha  tyatam  It  taid  to  ba  autonomout. 
With  diulpativa  tyttamt,  tha  flow  in  tha  phaaa-tpaoa  eharacttrixad  by  tha  fiald  of  valodty  vaotort  la 

X  (t)  -  ( x^  Xj  f  =  ( f,  fg ...  f^l’’’  (10) 

oontractins,  l.a.  dv/dt  <  0 


On  account  of  dv/dt  <  0  tha  voluma  alamant  It  mapptd  onto  a  tubapaoa  of  tha  phasa  tpaca  atymptotically  with  the  volume  zero. 
This  tubapaoa  It  a  tocallad  attractor. 


Thera  are  two  kindt  of  attractora:  periodical  attractora  at  atymptotically  ttabit  limit  cyclat  and  asymtotically  stable  fix  points  as 
attractors  which  are  primarily  of  interest  to  stable  ANN.  Theta  atympt^  tdutlons  (fix  pointt  for  t  •  >  >  oo  )  do  not  depend  on  the  initial 
conditions.  Moreover,  tha  type  of  behaviour  of  a  general  non  linear  tyatam,  whether  stable,  unstable,  oscillatory  or  chaotic,  depends  critically 
on  the  Input  applied  to  it.  For  ANN,  those  nonlinaar  structures  are  tharafdra  tuKabla  that  achiavt  asymptotic  stable  fix  points  (attractors)  for  a 
large  range  of  Input  patterns  l.a.  large  Input  signal  space  (Input  vector  tpaoa).  In  theta  fix  pointt,  tha  knowledge  contained  in  the  Input 
patterns  can  ba  stored,  or  tha  Input  signals  can  ba  dassifiad.  This  It  aceomplithad  by  modification  of  f.  during  tha  learning  (training)  procatt 
of  the  ANN.Upon  completion  of  learning  tha  W|.  are  fixed.  In  tha  racail  mode  tha  tyatam  acts  at  a  thort^rm  memory  (STM)  dynamic  systam; 
l.a.  a  content  addrattabla  memory  (CAM).  BtLd  on  this  brief  rapratantation  of  non-llnaar  system  behaviour,  tha  network  operation  can  ba 
summarized  as  dallnaatad  in  Fig.  13. 

NETWORK  OPERATION 


FIGURE  13 
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Applying  ttw  diract  irathod  of  Lyapunov,  diflerant  thaoratna  can  ba  utlllzad  to  prova  atabillty  for  ANN  with  faadback  connections.  A 
general  design  prindpla  for  absolutely  stable  Information  processing  and  memory  storage  by  nonlinear  feedback  networks  Is  the  Cohen- 
Grossherg  theorem  with  the  J-th  PE  activation  dynamics  as  shown  In  Fig.  14.  For  this  class  of  nonlinear  faadback  systems  two  important  cases 
for  the  PE's  activation  dynamics  can  be  discerned:  the  ao-cailed  additive  and  shunting  short-term  memory  equations  as  shown  In  fig.  IS. 
These  aquations  resprssent  the  basis  for  the  design  and  analysis  of  a  number  of  specialized  networks  as  applied  for  particular  problems.  An 
example  for  a  stable  network  structure  belonging  to  the  Cohen-Grossberg  class  Is  one  with  seH-excIting  recunent  eonneclions  and  neighbour- 
inhibiting  ones,  the  so-called  ccmpetHh/e  systems  as  shown  In  fig.  16. 


STABiLITY  THEOREMS 


LYAPUNOV  FUNCTIONS 

.  FIXED  POINT  *  IS  ASYMPTOTICALLY  STABLE,  IF  SCALAR  FUNCTION 
V(x)CvMAPS  R"  — >  R ’3  EXISTS  AND 

V  (X)  >  V  (X);  Vx  ^  X  :  POSITIVE  DEFINITENESS 
V(x)  <0  ;  Vx  ^  X  ;  NEGATIVE  DEFINITENESS 


COHEN  /  GROSSBERG 
STRUCTURES 


Xj  =  dj  (Xj)  [B(Xi)  -$W,iSi(XjJ] 
1-1 

Vliji  =  .  aWji-t-  S|(Xj)Si(Xi) 


krsr 


FIGURE  14 


COHEN-GROSSBERG  THEOREM 

ABSOLUTELY  STABLE  NONLINEAR  FEEDBACK  SYSTENS  STH  EQUATIONS 


Short-Term  Memory  Nonlinear  Signal  i  Long-Term  Memory 

Activation  Adaptive  Weight 


Additive  STM  Equation 

X,  =  -  Aj  X,  +1  f,  (X,)  B„  |gj{X,)  C„  W,J>  +1, 

1-1  1-1 

PAMtn  PrwKtve  NegaNe  kilMX 

DecaY  1  FiedSirti  Feedback  , 

1 - ^ ^ - 

1-1 

Shunting  STM  Equation 

X,  =  -A 

,  X,  -HB-X,)  |fj(X,)C„W, }♦>+!,- (X, -I- D)  |gj(X,)E„W,J).FJ, 

lj-1  J  J 

Bounded  Activations  plus  Automatic  Gain  Control 

FIGURE  15 


Furthermore,  ANN  are  realized  ki  structures  which  show  only  feedforward  connections.  These  are  Inherently  stable  If  they  comprise 

stable  single  eiementa  which  la  achievable  by  a  coffesponding  output  hmcfioo. 


I 
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STABILITY  THEOREMS.  CONT'D 


COMPETETIVE  SYSTEMS 


.  NETWORK  STRUCTURES  WITH  SELFEXCITING  RECURRENT  CONNECTIONS  AND 
NEIGHBOUR-INHIBITING  CONNECTIONS 


FIGURE  16 


SYSTEM  OF  DIFF.  EQUATIONS 


F~  INTRACONNECTION  WEIGHTS 


In  addition  to  global  atability,  converganca  of  a  natwork  playa  an  Important  rola.  Tha  atability  problam  occura  in  tha  racall  phasa  of 
ANN  with  faadback  connactiona,  Tha  oonvarganoa  conoama  tha  minimization  of  tha  arror  batwaan  tha  daalrad  and  tha  computad  ANN  output 
aignal.  For  thia  raaaon,  tha  oonvarganoa  la  o*  Importanca  In  auparvlaad  laaming  and  muat  ba  apadally  variflad  for  aach  corraaponding  ANN 
modal  and  tha  appropriata  laarning  atratagy. 


It  ahall  ba  mantlonad  that  thara  ara  ANN  applicatlona  whioh  damand  parlodical  attractora.  Tha  oonaapondlng  ANN  ara  trained  for 
stable  limit  cyda  oadllationa. 


Laaming,  Satf-Organiaatlon  (Encoding) 


Contrary  to  the  conventional  proceeding  in  which  the  soiution  of  a  problam  must  ba  available  In  form  of  an  algorithm,  the  way  to 
solve  a  definite  task  is  sah-organized  and  learned  in  tha  case  an  ANN  is  used.  Tha  ANNs  with  hard-wired  encoding  ara  an  exception  to  this.  In 
their  case,  tha  knowledge  of  tha  problem  and  Its  solution  Is  practicalty  Implamantsd  by  prenatal  determination  of  the  topology  as  well  as  tha 
strength  of  tha  connections  by  the  designing  expert. 


In  the  case  of  self-organization,  tha  neural  net  forms  an  internal  cognitiva  model  of  tha  task  and  thus  replaces  tha  mathematical 
description.  This  is  dona  by  generating  tha  suitable  mashing  and  weights.  Tha  problam  hare  Is  tha  determination  of  tha  modification  strategy 
which  leads  directly  to  the  problem  of  learning. 


As  mentioned  already,  in  case  of  a  massively  parallel  net  (Hg.  12)  tha  knowledge  lias  with  tha  way  of  linking  tha  slemants  (PE)  as 
wall  as  with  the  strength  of  tha  linkings  (Interconnection  weights).  If  laaming  It  understood  as  a  modification  of  tha  knowledge,  the  natwork 
Interconnections  can  ba  changed  In  three  ways:  generation  of  new  Interconnections,  loss  of  existing  Interconnections  and  change  of  existing 
Interconnection  weights.  From  adaptive  signal  prooeuing,  parameter  and  structure-adaptive  filtar  structures  ara  known.  Regarding  ANNs  tha 
procedure  is  that  so  far  only  tha  weight  factors  of  gNan  ANN  structures  ara  modifiad.  ^  tha  Intaroonneotlon  weight  zero,  an  Interconnection 
can  ba  Interrupted  (acts  like  a  structural  change)  or  conversely  Its  affMancy  can  ba  Increased  by  Increasing  tha  weight  factor. 


As  shown  in  Fig.  17,  supervised  and  unsuparvisad  laaming  (encoding)  can  ba  discerned.  Tha  supervised  learning  by  error 
backcoupling  has  already  been  explained  In  chapter  5  when  dealing  with  simple  adaptive  processing  elements.  For  multi-layer  nets,  the 
method  of  error  backoou^lng  fails.  Here,  the  so-called  back  propagation  algorithm  must  be  used  for  the  supervised  learning  since  for  the  PEs 
on  hidden  layers  the  desired  output  cannot  be  classified.  In  the  example  as  treated  In  chapter  g  a  back  propagation  ANN  is  considered  In  more 
detail.  The  reinforced  learning  as  mentioned  in  Fig.  17  Is  looked  at  again  in  chapter  8  when  dealing  with  the  neuro  control  problem. 


If  no  predefined  training  data  are  available  or  If  their  generation  is  too  time  and  cost  consuming,  self-organizing  nets  must  be 
utilized  that  learn  unsuparvisedly.  Basad  on  local  Information  and  internal  ANN  control,  the  net  self-organizes  the  presented  data  and 
discovers  Hs  emergerrt  collective  properties.  Unsupervised  Hebbian  learning  (Donald  Hebb,  1949)  Is  Important  to  many  ANN  designs.  From 
Fig.  18  H  becomes  evident  that  the  Hebbian  learning  rule  oomputee  a  oorralation  between  the  presynaptie  signal  (s,)  and  tha  postsynaptic 
activation  (x.)  where  a  positive  correlation  (x,  s,  >  0)  Is  causing  a  weight  Increase.  Also  a  passive  decay  tarm  (-  ocw,|)  Is  often  added  In  the  Wj| 
equation.  In’each  case,  only  local  Intormatidn  is  required  aa  compared  to  error  backcoupling  or  error  back  propagation,  I.e.  the  presynaptib 
signal  on  the  Input  path,  the  postsynaptic  activation  of  the  PE  and  possibly  the  actual  Interconnection  weight  value.  In  many  cases,  the  output 
signal  s,  (x.)  Is  used  Instead  of  the  postsynaptic  activation.  For  the  dasa  of  Cohen-Orossberg  structures  as  mentioned  before,  the  so<alled 
passive  deiltay  and  gated  decay  unsuparvisad  learning  aquation  for  the  long-term  memory  weight  adaptation  can  be  utilized  (Fig.  19). 


LONG-TERM  MEMORY  LEARNING  TAXONOMY 
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1  HARDWIRED 

1  SUPERVISED 

1  UN-SUPERVISED 

1  (OFF-LINE) 

1  (OFF-LINE) 

1  (REAL-TIME,  ON-LINE) 

PflEOETERMiNEO 
CONNECTIONS 
AND  WEIQKTS 


ERROR 

CORRECTION 

FEEDBACK 


BACK'COUPUNQ 
FOR  1  LAYER 
NCTR 

(LSE  EQUIVALENT) 
BACK- 

PflOPAQATION 
FOR  MULTI-UYER 
NETS 


UKE  ERROR 
CORRECTION. 
HOWEVER  NO  ERROR 
FOR  EACH  OUTPUT 
PE.  BUT  OVERALL 
PERFORMANCE 
(INPUT  AND  OUTPUT 
LAYER) 


-  DIRECT 

S(*,)-S(ij) 

I  -  DIFFERENTIAL 
A  W,|  ~ 

S(xt)  •  S(ii|) 


I"  OH-CENTER/ 
OIT-SURROUNO 
K^RACnON 
(INPUT  AND 
OUTPUT  LAYER) 


FIGURE  17 


UNSUPERVISED  HEBBIAN  LEARNING 

(SELF-ORGANIZATION) 

OUTPUT. 


ACTIVATION  /  RECALL 


INPUT,  S„ 


LEARNING  (LTM/ENCODING) 


I _ 


Wj  =T\_*Xj  •  s  =.  <W,.S  >  'S 


Wji  =T\Xj  Si 


MEASURE  OF  WEIGHT-  AND 
INPUT-VECTOR  CORRELATION 


FIGURE  IS 


Oiffsront  toamlng  i^'sdigms  ar*  making  um  of  ttw  K-callrf  compaMIva  laarning.  In  Its  slmplatt  vtrtion,  compatitiva  learning 
works  In  combination  wKh  rtcall  as  shown  in  Fig.  20  (off-lina,  unsuptrvisad).  The  weight  vector  jji.  that  matchae  best  with  the  Input  vector  will 
yield  the  highest  activation  of  the  asaodated  PE.  ThIe  It  called  the  winning  PE  end  only  He  input  weight  vector  (jf.)  •  and  nona  of  the  otheie  •  It 
adjusted  In  proportion  to  Ht  euclidean  distance  d  from  the  Input  vector  (y).  in  an  extension  the  output  layer  P^oan  oompete  wHh  each  other 
Intra-layar  by  tending  poeHive  feedback  signaitlD  Haelf  (recurrent  aeH-exoHation)  and  negativt  aignalt  to  aN  Ht  nelghbourt  (lateral  neighbour 
InhibHIon).  Thia  typo  of  connection  wee  alroadyahown  In  Fig.  tS  when  mentioning  oompetHIvetyttente. 


COHEN-GROSSBERG  THEOREM 

STM  AND  LTM  DE8IQN  EQUATIONS 
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A  general  design  frame  for  absolutely  stable  information  processing  and  memory  storage  (CAM) 
by  nonlinear  competitive-cooperative  feedback  networks 

FIGURE  19 


NNET;  UNSUPERVISED  LEANRINQ  EXAMPLE 


INITIALIZE  •W' 

BY  SMALL  RANDOM 
VALUES 


Important  Seif-organlzIng  ANN  models 
a)  Self-organizing  Feature  Map  (SOFM) 


The  so-called  telf-organizing  feature  map  Introdtroed  by  Kohonen  oonslstt  of  a  one-  or  two<llii)enalonal  arrangement  of  the  PEa. 
The  structure  Is  completely  meshed  and  processes  real-valued  Input  signals.  The  PEs  simply  form  the  sum  of  thslf  weighted  Inputs.  The 
modification  of  the  Interconnection  weights  Is  mads  aooording  to  the  previously  mentionsd  method  of  oompetHIve  learning  (Fig.  20).  However, 
In  addition  to  the  weight  vector  of  the  winner  PE  the  weight  vsolors  of  a  predetermined  neighbourhood  of  eeHs  In  the  area  N.  are  also 
modified  In  this  case.  This  area  Is  reduced  with  Incrmng  training  time. 


In  Fig.  21  a  portion  of  a  ons-dimsnsional  SOFM  network  with  a  two-dimensional  lnput-(iealure)  space  Is  shown  (left).  When  an 
input  vector  is  presented  the  winning  network  node  (PE)  Is  Identified  oorrespondlng  to  the  minimum  euoNdean  distance  between  input  and 
weight  vector  of  that  node  (see  also  Fig.  20).  The  weight  vector  of  the  winner  (oloaost  to  Input  veolor)  and  thoss  of  Ha  neighbours,  regardless  of 
their  values,  are  updated  to  learn  the  ourrant  Input  by  moving  closer  to  Ks  position  (Fig.  21,  right). 
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K  th*  training  vaetora  form  Individual  boundarlaa  In  tha  faatura  apaoa,  tha  waight  vactora  wlH  hava  adapiad  thamaatvaa  aftar  a 
•uffldant  quantity  of  laaming  atapa  In  auch  a  way  that  thay  rapraaant  oocraaponding  olataaa.  I.a.  topologloally  oloaa  prooaating  alamanta 
oorraapond  to  phyalcally  adjanant  groupa  of  Input  vactora  (daiiaa)  that  la,  ^^OFM'a  can  eomputa  tha  probabIHty  danalty  function  (PDF)  of  Input 
vactora  and  rapraaant  K  Implldtah/  by  thair  own  danalty. 


Learning  of  SOFM  la  unauparvlaad  but  laaming  phaaa  and  applloation  phaaa  ara  aaparatad  from  aaeh  othar.  Thua,  problama  can 
ariaa  during  tha  application  phaaa,  In  tha  oaaa  of  a  alow  change  of  tha  In^  data  or  tha  ocourranoa  of  data  not  taamad  by  tha  nat. 


SELFORGANIZING  FEATURE  MAP  (SOFM) 


1  -dim  SOFM  with  2-dlm  Feature  Space 


Weight  (Node)  -  Response  upon  Presentation 
of  an  Input 


Topological  Ordering 
of  Nodes 


1  -dim  or  2-dim  Neighbourhoods  of  Connectivity  Among  Nodes 

.X. 


FIGURE  21 


b)A«,  ,>tiva  Raaonanoa  Theory  (ART) 


To  mimic  cognitiva  functions  autonomous  aatf-organizing  ayatama  require  means  that  ara  capable  of  learning,  memory  and 
recognition  in  an  unpradlctabla  world  v-  'h  no  taachar  availabla.  Corresponding  computational  units  must  continue  to  laam  In  a  stable  fashion 
where  this  new  learning  must  not  for  c  taalacliva  forgetting  of  past  acquired  knowladga. 


Groasbarg  and  Carpenter  (1967)  designed  tha  ao<sallad  ART  network  (Adaptive  Resonance  Theory)  In  order  to  solve  tha  dilemma 
between  stability  of  tha  laamad  knowladga  and  tha  plastichy  l.a.  tha  capability  of  continued  laaming.  ART  natworfca  ara  stable  enough  to 
preserve  significant  post-acquired  knowladga  but  navarthelass  ramirin  adaptable  enough  to  Incorporate  new  information  whenever  H  might 
appear.  Tha  basic  Idea  which  lead  to  tha  ART  was  tha  disoovary  that  a  34ayar  net  (Fig.  22)  with  compatitiva  learning  can  perform  any  mapping 
from  input  (feature)  space  R*^  to  output  (category,  class)  space  r".  Tha  ART-nat  can  be  Imagined  as  a  two4ayar  structure  resulting  from 
folding  back  tha  thraa-laval  network  on  HmH  as  shown  in  Fig.  22.  Thua  tha  simple  ART  module  Indudas  a  bottooHjp  compatitiva  learning 
stage  In  combination  with  a  tofxlownoutstar  system,  both  rapraaanting  adaptive  Altars  with  assodatad  LTM  weights. 


Tha  main  funcAon  of  tha  ART  Is  that  tha  topdown  atMnAva  faadbaoli  encodes  laamad  axpactationt  (laamad  bottom-up)  In 
responsa  to  arbitrary  tamporal  saquanoas  of  spatial  Input  pattsrns  In  real  Ama.  A  large  enough  mismatch  at  level  F^  quickly  resets  tha  F,  coda 
before  new  laaming  can  occur  by  triggering  tha  orlanAng  signal  (Fig.  22).  Tha  F,  coda  Is  reset  A  tha  degree  of  match  Is  smaller  man  a 
pradatarminad  vigilance  parameter,  ki  mis  basic  configuration  of  tha  ART,  stored  pmms  can  be  parmananAy  updated  on  the  one  hand  and 
on  tha  othar  hand,  additional  pattern  dassas  In  tha  net  can  be  ganaratad  A  the  Input  pattern  has  no  similarity  to  axIsAng  pattern  dassas. 


The  dAfarant  rnamods  of  laaming  and  saA-organIsaAon  poaaase  parAoular  Important  oharactarisAcs  (Fig.  23)  which  have  to  bo 
considarsd  whan  adacAng  tha  appropriate  network  for  agivan  appHodlon.  In  Fig.  23  ARTkW  Is  a  now  arehitaetura  using  muHIpla  ARTs  in  a 
network  hierarchy  wHh  aupaivlaad  asaodaAva  laaming.  Also  tha  Vsolor  AsaodaAva  Map  (VAM)Nalwotfc  la  a  new  design  for  fast  unsupatvisad 
rsalAma  atror-baaad  laaming.  A  might  play  an  Important  rda  In  sensory  motor  control  type  proMams.  In  Hs  key  features  It  Is  oomplamantary  to 
tha  ART  net 


ART1  NETWORK 


FIGURE  22 


ANNR«eiMIO|MrMlon 


!rt  tlw  prwading  ehapttr,  tfw  gattwrInD  and  ttorago  o<  information  and  knowladga  In  an  ANN  aat  traatad  with  aoma  datail.  Tha 
recall  ooocarna  tha  latriaval  of  Information  etorad  in  ANN;  U.  lha  STM  function.  Tha  raoall  prooadura  la  In  ganaral  givan  by  tha  solution  of  an 
activation  aquatfon  in  oonnaetion  with  a  paitioutar  output  function  as  shown  In  ganaral  form  In  Fig.  8.  Similarly  as  for  foaming,  thara  ara  soom 
basic  raoall  paradigma,  from  which  for  CohaivOrossbsrgatiMCiufaathaaddltlva  and  tfia  shunting  STM  aquations  wars  givan  In  Fig.  IS. 


Basle  hinctiona  of  trainad  ANN 


As  alraady  nfontionad  ANNs  ara  aotually  oonfont  addrasaabfo  mamorfoa  (CAM)  which  sWwr  racaU  stored  Information  or  encode 
new  Input  information  aalf-oontalnad  or  auparvfoad  by  a  tsaohar.  Applying  ANNs  the  following  baaio  funotlont  can  be  parformad  (sea  also 
Fig.  13): 
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With  •  nolty  or  inoomplot*  Input  pattern  (vactor),  tha  undlaturbad  complate  Input  pattam  la  ganaratad  at  output. 
Input  pattam  and  aiaociatad  output  pattern  ara  diffarc  it. 

Each  Input  pattern  It  aatignad  to  ona  ot  aavaral  dataaa  which  ara  dafinad  by  tha  output  pattern. 

In  all  thraa  caaat,  tha  noraga  of  aaaodatlva  mappings  la  oonotmad.  Thara  ara  many  applications  whara  system  elements  must  be 
described  by  such  tUmulua^sponse  type  mappings  Including  such  as  linear,  nonllnaar,  logical  or  binary  ones. 

ANN  Summary 

While  Fig.  13  shows  tha  operational  procassas,  tha  main  faaturaa  ara  rasummarizad  In  tha  following:  Artificial  neural  nets 

are  computers  that  team  how  to  solve  problems 

problem  solving  Is  based  on  sample  date  and  teaming  mechanisms 

they  do  not  require  expert  knovdedge  representation,  logical  Inferenclng  schemes,  statistical  algorithms  or 
spadallst/analyst  to  develop  and  code  a  solution 

they  are  trained  to  identify  seH-oontalnadly  tha  key  features  and  associations  enabling  them  to  distinguish  different 
patterns 

ckn  learn  on-line  real-time  or  can  be  trained  off-line  by  a  sample  data  sat 

do  require  an  appropriate  architecture  with  sufficient  capacity  and  paradigmatic  learning/tialnlng  scheme 

they  consist  of  three  major  elements:  organized  topology  of  Interconnected  processing  elements,  method  of  encoding 
Information,  method  of  recalling  Information. 

Their  strengths  and  weaknesses  are  summarized  as  follows: 

Strength: 


Auto-assodation: 

Hetero-assodatlon: 

Classification: 


unique  solutions  based  on  user  data  examples 
no  need  to  know  algorithms 

less/no  software  needed,  more  hardware-processing  power  required 

provides  solutions  to  problems  such  as:  pattern  matching  and  recognition,  data  compression,  near-optimal  solutions  to 
optimization  problems,  non-linsar  system  modelling  and  control,  function  approximation  etc, 

Inherent  parallel  processing  structure  yields  faster  solutions  to  a  number  of  computation-intensive  problems 
-'  Internal  generation  of  complex  decision  areas  by  means  of  non-linear  combination  of  input  vector  components 
•  robust  performance  in  view  of  noisy  and  disturbed  input  signals 
-  Inherentty  fault-tolerant 

ANN  weaknesses  are  that  they  are  not  applicable  to  all  processing  problems  and  do  require  training  and  test  data  examples  -  with 
a  few  exceptions. 

A  comparison  of  ANN  with  conventional  digital  computers  Is  summarized  In  table  1.  This  leads  directly  to  some  remarks  regarding 
the  utilization  of  ANNs. 


FEATURE 

DtGlTAL  COMPUTER 

NEURAL  PROCESSING 

Processing 

order 

Programs  with  serially  performed 
instructions 

'  Parallel  programs  with 
comparatively  lew  alepa 

COMPARISON  CONV.  DIGITAL  vs. 

NEURAL  PROCESSING 

'<nowledge 

storage 

Static  copy  of  knowledge  Is  stored 
in  addressed  memory  location  I 

New  Information  destroys  old 
Information 

■  Information  stored  in  the  inter¬ 
connection  of  naurona 

.  Knowledge  adapted  by  changing 
Interconnoction  atrangl 

I 

1  TABLE  1 

I 

ProceMing 

control 

Central  processing  unit  monitors 
all  acUvItlaa  and  hse  acceac  to 
globol  Intormatlon,  crosting 
procasfng  bottlenacfc  and  critical 
point  ol  failura 

-  No  control  nor  monitoring 
of  a  nauron'a  activity 
■  Nauron’a  output  only  a  function 
of  Ha  locally  avallabto  Information 

Irom  interconnactad  neigtibours 

1 

F/tULT 

TOLERANCE 

Ramovsi  of  any  processing 
component  toads  to  a  defect 

Corruption  of  msmory  is  irrstrival, 
toads  to  a  failure 

-  Diatrlbutad  knowtodge/informalion 
ropraaanlatlon  acroea  many 
naurona  and  thair  inlarconnactlon 

■  H  portion  of  naurona  ramovad, 
Intormatlon  ralainod  through 
redundant  dtolributad  ancoding 
DISTRIBUTED  ENCODING 

— ♦  FAULT-INTOLERANT 

— ►  FAULT-TOLERANT 
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G«n«ral  ramark*  lor  ANN  appllcaUoii 


Concerning  the  potential  application  It  can  be  tald  that  In  many  problema  with  only  email  or  almost  no  knowledge  existing  on  the 
object  concerned,  or  where  the  parameters  and  states  of  this  object  can  neither  be  described  mathematically  nor  by  rules  and  facts  In  a 
somehow  reliable  manner,  the  development  of  sequential  algcrithms  for  conventional  processors  Is  extremely  difficult.  The  necessary 
expenditure  of  cost  and  time  for  the  algorithm  and  software  development,  verification  and  validation  Is  correspondingly  high. 


Contrary  to  the  sequential  conventional  Information  processing,  the  processing  of  Information  utilizing  neural  nets  offers  in  general 
considerable  advantages  for  all  applications  which  are  characterized  by  limited  knovrledge  on  the  object.  In  contrast  with  the  programmed 
sequential  computing,  ANN  can  be  applied  successfully  for  the  solution  of  problems  with  Inexact  and  incomplete  or  even  contradictory  Input 
data. 


The  ability  of  neural  nets  to  learn  by  examples  (training  patterns)  or  even  unsupervised  Is  of  particular  importance.  It  is  not 
necessary  to  program  a  task-specific  function  or  Information.  If  representative  example  data  are  available  In  sufficient  number  and  by  training 

of  the  net  with  these  data,  due  to  Its  generalization  properly  the  net  can  tolerate  Input  data  which  are  superimposed  by  noise  and 
disturbances,  for  the  recognition  of  the  ir'out  patterns. 


By  the  use  of  non-linear  processing  elements  In  the  network,  multi-level  nets  can  form  complex  decision  areas  In  the  feature 
space.  This  corresponds  mathematically  to  a  non-linear  mapping  of  the  Input  vector  space  onto  that  of  the  output  vector.  This  allows  also  the 
modelling  of  non-llnear  systems. 

G.a.C.  Applications  of  Artificial  Intelligence 
and  Neural  Computing 


Personnel  Training 


INTELLIGENT  ACTIVITY  CHAIN 

Situation 

1  Definition  of 

Generation  of 

Decision 

Action 

Action  Execution/ 

Recognition 

Obtectives 

Favourable 

Solutions 

Making 

Plan 

Control 

FIGURE  24 


7.  Categories  for  ANN  applteatlon 


It  has  already  been  mentioned  (chapter  2)  that  0.a.C.  problams  extend  over  several  hierarchically  structured  levels.  In  order  to 
perform  G.a.C.  functions  on  these  levels,  the  Implementation  of  a  controlling  feedback  chain  typical  of  all  Q.a.C.  levels  is  required.  Any 
technical  apparatus  which  Implements  the  feedback  chain  In  a  real-time  autonomous  system  requires  the  solution  of  perception  problems  as 
associated  with  the  sensors  and  cognition  problems  (e.g.  recognition,  hypothesis  testing)  as  far  as  ths  remaining  functions  are  concerned. 
Very  often  In  such  systems,  exploratory,  goal-oriented  actions  will  be  performed  resulting  in  a  pereeption-cognition-aclion-rscognition  cycle. 


It  has  been  shown  (Rg.  3)  that  for  the  Implementation  of  such  quasi  mental  functions  elements  of  artificial  Intelligence  are 
required.  In  addition  to  more  oonventimal  expert  system  techniques  ANN  will  gain  an  Increasing  Importance  within  this  scope.  Therefore,  as 
shown  In  Fig.  24,  the  application  potential  for  ANN  covers  many  areas,  extending  from  rslatively  simple  appHoations  In  Intelligent  sensory  and 
actuator  systems  to  highly  complex  mission  and  scenario  management  problems. 
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Areas  which  represent  potential  categories  for  successful  ANN  application  and  which  are  recurring  In  many  Q.a.C.  systems  are  the 

following: 


pattern  recognition,  signal  classification 
associative  memories 
self-organisation,  learning 
knowledge  acquisition,  adaptive  expert  systems 
adaptive  signal  processing 
control,  stabilization,  guidance 
•  decision  finding 

-  optimization  procedures 

Integration  and  fusion  of  multiple  sensor  data 
robotics,  sensory-motor  control 

It  falls  beyortd  the  scope  of  this  paper  to  treat  these  categories  here  In  more  detail.  On  account  of  the  Importance  for  Q.a.C.,  some 
further  considerations  ooncemlng  neuro-control  should,  however,  be  made. 


8.  Neuro  Control 


The  application  of  ANN  for  oontrol,  stabilization  and  guldanoe  of  objects  can  be  oonstdersd  as  a  further  step  In  the  evolution  of 
control  techniques  to  face  up  to  the  ehallengss  within  the  scope  of  more  complex  systems  which  require  mors  adaptation  and  self- 
organiution  capabilities.  Thereby,  the  msin  problem  Is  conosmed  with  the  real-time  oontrol  of  o^ects  which  are  nonlinear  and  noisy  and 
where  the  dynamics  of  which  Is  time-varying,  only  Incomplete  or  even  unknown  at  all. 


As  common  to  all  ANN,  a  characteristic  feature  of  the  neuro  controllers  Is  that  they  are  not  programmed  but  trained  either 
supervised  off-line  or  unsupervised  on-line. 


As  a  generalized  example  the  structure  of  a  fauH-tolerant,  adaptive/leaming  neuro  control  system  especially  suitable  for 
applications  on  the  lower  levels  of  the  Q.aC.  systems  hierarchy  (missiles,  mannsd/unmannsd  air  vehldss,  robotics,  mobile  robots  etc.)  Is 
shown  in  Hg.  2S.  As  can  be  seen  by  this  example,  neurooontrol  systems  can  Include  subsystsms  for  pattern  recognition  in  sensor  data,  failure 
dstection  and  Identification,  dynamic  modelling  etc.  which  are  realized  as  ANN,  however,  are  only  of  secondary  Importance  for  the  actual 
neuro-control  problem. 


teaming  mechanisms  based  on  error  beckooupling  as  shown  In  Hg.  tO  are  less  suitabis  for  many  neuro-control  applications  since 
they  require  a  reference  signal  for  supervised  learning  for  the  outputs  of  each  single  ANN  element  (PE).  These  reference  signals  are  often  not 
available  from  the  natural  environment. 

NEURO-CONTROL 


FIGURE  28 
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The  untupetvised  Hebblan  let'nlno  I*  often  eleo  not  applicable  to  neurc-control.  As  already  mentioned  and  shown  In  Fig.  18  it 
takes  Infonnatlon  for  the  adaptation  of  the  connection  weights  from  the  network  Itself  (PE  activation  x.,  PE  input  S|).  For  many  ANN 
applications  this  Is  a  very  favourable  characteristic  since  no  communication  with  the  outride  world  is  required  for  learning.  However,  this  can 
be  a  serious  disadvantage  for  neuro-control  applications  if  a  particular  perfomnanoe  criterion  must  be  met  which  Is  referenced  to  the 
environment. 


The  reinforcement  learning  paradigm  (Rg.  26)  takes  this  drcumstanoe  Into  account.  Thereby,  the  reinforcement  (r|^)  Is  a  measure 
for  the  change  of  a  behaviour  or  performance  criteria  and  thus  considers  the  success  or  failure  of  a  control  action.  The  eligibility  (e.)  of  a 
synaptical  pathway  Is  a  function  of  the  product  resulting  from  the  signal  on  this  pathway  and  the  output  of  the  corresponding  PE  lookecf  at  for 
a  particular  delayed  period  of  time.  Thus,  the  eligibility  Is  a  measure  of  up  to  which  extent  the  Input  signal  on  a  synaptic  connection  has  also 
led  to  a  large  output  signal.  The  eligibility  should  decay  (for  example  exponentially,  Rg.  26)  unless  another  high  value  of  the  eligibility  results 
from  the  simultaneous  occurence  of  an  Input  signal  and  the  resulting  PE  activation.  The  reinforcement  learning  is  formally  similar  to  the 
Hebblan  learning  If  the  PE  activation  (xj)  and  the  PE  Input  (S|)  are  replaced  by  reinforcement  (r)  and  eligibility  (e). 


There  a,  j  a  number  of  neuro-control  paradigms  appllcabie  to  the  design  of  the  actual  neuro-controller.  The  Interested  reader  must 
refer  to  the  available  literature. 


As  a  frontend  problem  of  neuro-control  relevant  data  and  facts  from  similar  and/or  dissimilar  sensor  Information  are  to  be 
obtained.  Therefore  in  two  examples  ANN  designs  for  multiple  redundant  aensor  data  failure  detection  and  identification  as  well  as  for  target 
Identification  are  briefly  presented  In  the  following  chapter. 


REINFORCEMENT-LEARNING 


ACTIVATION/RECALL 
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REINFORCEMENT  INPUT  x 


e,  k.  ELIGIBILITY  OF  i-th  PATHWAY  AT  TIME  t^ 
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FIGURE  26 


9.  Example 


As  a  first  example  the  fault  tolerant  measurement  of  the  proprio-spedfic  motion  state  of  an  air  vehicle  shall  be  looked  at  here.  For 
this  purpose,  a  number  of  redundant  sensors  for  the  angult,  rate  (e.g.  gyros)  as  well  as  for  the  linear  acceleration  (accelerometers)  are  utilized. 
In  order  to  meet  the  reliability  and  fault  tolerance  requirements  with  a  minimum  number  of  sensors,  the  arrangement  of  the  sensors  is  skewed 
such  that  each  sensor  monitors  several  axes  of  the  air  vehide.  The  problem  now  is  to  detect  faults  and  performance  degradation  and  to 
localize  the  possibly  defective  sensor  among  the  redundantly  available  ones. 


The  block  diagram  of  the  signal  processing  elements  required  for  this  purpose  Is  shown  In  Rg.  27..  The  measurement  vector  £n 
comprises  the  sensor  outputs  and  Is  a  function  of  the  real  physical  motion  state.  Moreover,  the  amaneable  measurement  contains 
contributions  due  to  step,  ramp-  or  stochastic  type  failures,  represented  by  the  failure  vector  E  . 

In  a  first  ANN  elemsnt,  so-called  validation  or  feature  vectors  /  -  (v,,  Vg  ....vj  are  determined  by  a  projedion  of  the 
measurement  vector  m-  In  «he  case  of  a  specified  fixed  sensor  geometry  a  hard-wired  ANN  (Rg.  27,  ieft  network  part)  can  be  used  where  the 
connection  weights  represent  the  projection  mapping  P. 
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The  second  ANN  element  performs  the  fault  detection  and  localization  which  corresponds  to  a  classification.  The  output  signals 
calculated  by  the  network  and  accumulated  In  the  classification  vector  u  gives  Information  which  sensor  Is  a<<ective  I.e.  which  class  the 
present  Input  feature  vector  Is  belonging  to.  Correspondingly,  only  one  of  the  output  (u.  In  Rg.  27)  has  a  high  activation  (~  f)  while  the  output 
of  the  others  is  small.  In  the  following  two  network  models  are  considered  for  the  cias^cation  task  ANN-module. 


FDIR  NETWORK 
•  BLOCKDIAGRAM 


•  FAILURE  DETECTION  AND  LOCALIZATION 


i-rh  rCASUMHENT 
KFCCT 


FIGURE  27 


Optimal  linear  Associative  Memory  (OLAMl 


If  the  determination  of  the  feature  vector  (y)  Is  the  result  of  a  mathematically  exact  modelling  of  the  relation  between 
characteristics  and  classes,  an  OlAM  can  be  used.  The  process  of  encoding  (teaming)  the  Input  Information  Is  then  reduced  to  the  a-piM 
calculation  of  the  optimal  weight  matrix  which  mappes  the  Input  vector  y  onto  the  dassiftcation  vector  g.  The  optimal  weight  matrix  W 
yielding  the  least  mean  square  correlation  between  Input  ^)  and  output  vector  (y^)  pairs  (k  •  1,2...m)  Is  computed  from  the  pseudo-inverse 
of  the  matrix  X  as  shown  in  fig.  27.  Here,  X  >  (y.|,  ^  Q'*^'**'*  recursive  algorithm  can  be  utilized  for  example  to 

compute  the  pseudo-inverse  of  X.  The  recalling  Is  simply^  muttipllcation  of  y  by  the  optimal  weight  matrix  W. 


With  OLAM,  the  total  ANN  for  the  faun  detection  and  localization  Is  hardwired  u  shown  in  Rg.  27.  It  is  a  three  layer  network.  The 
number  of  input  PE  corresponds  to  the  number  of  sensor  signals,  the  number  of  PE  In  the  hidden  layer  to  the  dimension  of  the  feature  vectors 
and  the  number  of  PE  In  the  output  layer  to  the  number  of  defects  or  failures  to  be  localized. 


Back-propagation  network 


It  has  already  been  mentioned  that  for  the  case  of  Hnearty  unseparaMe  otasaes  multilayered  nets  must  be  used  for  classification. 
The  network  model  moat  widely  used  for  this  kind  of  appHcaten  is  the  back  propagation  network.  Its  function  and  the  associated  equations 
are  briefly  reviewed  here  (Rg.  28). 

Input  and  output  variables  are  scaled.  The  PE  of  the  Input  layer  merely  ntemorizs  the  present  Input  signal.  ^  input  layer  PE  Is 
connected  wHh  all  PEs  In  the  hidden  layer.  The  latter  multiply  the  Input  signals  as  woH  as  the  bias  with  the  associated  vrelght  factors 

*J-J*|Wj|+W„ 

I 


(13) 
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Backpropagation  ANN 


INPUT  HIDDEN  OUTPUT 

LAYER  LAYER  LAYER 


U 


1 


U 


2 


U 

m 


FIGURE  28 


The  output  signal  of  the  j-th  hidden  PE  Is  the  sigmoid  function  of  Hs  activation  x. 


■ 


(14) 


with 


6(x)-  l/d-e-*) 


(15) 


The  same  functions  are  performed  by  each  PE  of  the  ou^ut  layer  while  these  are  also  completely  interconnected  with  the  PEs  of 
the  hidden  layer.  There  are  no  Intra-layer  connections.  Therefore  the  back  propagation  ANN  is  a  feed  forward  structure  In  which  each  element 
of  a  follow-up  layer  receives  inputs  from  all  elements  of  the  preceeding  layer. 


The  learning  Is  performed  by  adapting  the  connection  weights  in  such  a  way  that  the  sum  of  the  squares  of  the  error  between 
network  output  variables  (u)  and  the  desired  output  variables  of  a  set  of  training  data  Is  minimized. 

Let  US  assume  that  there  are  M  Input/output  vector  pairs  for  the  training.  Initially,  the  weights  are  set  to  sme'l  random 

values.  After  the  processing  of  the  m-th  training  data  pair,  the  ws  ights  are  adapted  as  follows: 

^(m).yv(m.1)^^^(m)  ^,5, 

where  rxw^"')  for  the  weights  between  hidden  and  output  layer  becomes 

A  -  7  ‘“k""’ '  •) 

and  for  the  weights  between  Input  and  hidden  layer 


A  Wj/'">  -  7  ri,)  ( ^6{ih)  (d^^'^’-u^*"”)  •  (18) 

k-1 


In  this,  Is  again  a  measure  (or  the  learning  rate,  SV)  I*  tf**  derivative  of  the  output  function  and  V|  Is  the  l-th  Input  signal. 
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As  shown  by  tquation  (18)  th#  error  between  actuet  and  desired  output  Is  backpropagated  from  output  to  hidden-layer  PEs. 
Purthemiore,  there  Is  a  weight  transport  from  output  to  hidden-layer. 

The  total  network  for  fault  detection  and  localization  (see  aleo  Fig.  27)  Is  shown  In  Fig.  29.  As  can  be  seen,  the  backpropagatlon 
ANN  Is  preceedlng  by  the  network  for  the  generation  of  feature  vacKirs  already  Introduced  In  Fig.  27. 
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FIGURE  29 

The  backpropagatlon  part  has  bean  trained  with  training  data  In  approx.  2,000  supervised  learning  steps.  Tests  with  test 

data  sets  showed  very  good  results  also  with  very  noisy  feature  vectors. 

The  optimization  of  the  number  of  PEs  In  the  hidden  layer  generally  Is  a  problem  of  the  backpropagatlon  net.  The  PEs  of  the  input 
and  output  layer  are  detennined  by  the  dimensions  of  the  feature  and  daaslfication  vectors. 


For  the  daaslfication  of  different  targets  In  Infra-red  (IR)  Images,  a  daasifier  has  been  designed  on  the  basis  of  a  backpropagatlon 
ANN  and  compared  with  the  results  obtained  with  a  polynomial  dassifier.  The  superiority  of  the  neural  daasifier  becomes  evident  from  Fig.  30 
where  the  detection  rate  is  plotted  against  the  false  alarm  rate  for  both  dassifiefs.  It  shall  be  mentioned  here  finally  that  the  design  and  the 
training  of  the  neural  dassifier  requires  far  less  expense  as  compared  to  the  development  of  the  conventional  one. 
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FIGURE  30 


1-23 


1 0.  Implementation  of  neural  nets 


The  following  includes  a  brief  summary  concerning  the  possbilities  available  today  for  the  realization  or  Implementation  of  ANN: 

Software  realizations  for  existing  computers  (super  oomputsrs,  massive  parallel  computers,  conventional  computers) 
which  are  In  principle  not  designed  for  ANN  Implementations.  Thereby,  a  mapping  of  the  ANN  Is  made  by  virtualization  on 
systems  and  structures  In  which  no  or  not  all  connections  and  processing  elements  of  ANN  are  Indeed  physically  existing, 

I.e.  they  appear  as  memory  areas  and/or  program  structures. 

Electronical  Implementations  which  are  specifically  designed  for  the  layout  of  the  ANN  signal  processing  (bus-related 
processors,  co/attached  processors,  special  Integrated  circuits).  Also  analog  devices  are  promising  for  high-speed  ANN 
implementations. 


Bectro-optical  or  purely  optical  realizations.  These  will  probably  gain  great  Importance  in  future. 


11.  Final  Remarks 


Concerning  the  artificial  neural  nets,  there  is  at  present  a  big  euphoria.  If  we  look  at  it  soberly,  however,  it  cannot  be  neglected  that 
there  Is  a  whole  variety  of  unsettlnd  questions  requiring  Intensive  research.  In  consideration  of  the  obtained  knowledge  state  and  If  we  are 
aware  of  the  still  unsettl.id  questions,  ANN  can  be  used  profitably  for  particular  tasks  already  today. 


The  bioloi;lcal  psrvc  sj-stem  is  the  living  example  for  the  fact  that  strongly  meshed  systems  of  an  extremely  high  order  can  adopt 
stable  states.  Moreover,  without  supervised  control,  these  biological  systems  are  able  to  act  purposhrely  and  task  oriented.  By  an  extensive 
comprehension  of  the  biological  pr.radigm,  the  brain,  we  must  try  and  strive  to  recognize  the  regularities  which  might  be  of  decisive  use  to  us 
for  the  stabilization  and  sell-crgan!zation  of  highly  Integrated  complex  dynamic  systems. 
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SUMMARY 

Building  intelligent  systems  that  can  model 
human  behavior  has  captured  the  attention  of 
the  world  for  years.  So,  it  is  not  surprising  that 
a  technology  inspired  by  the  mind  and  brain 
such  as  neural  networks  has  generated  great 
interest.  This  chapter  will  provide  an  evolution¬ 
ary  introduction  to  neural  networks  by  begin¬ 
ning  with  the  key  elements  and  terminology  of 
neural  networks  and  then  developing  the  topol¬ 
ogies,  learning  laws  and  recall  dynamics  from 
this  infrastructure.  The  perspective  taken  in  this 
paper  is  largely  that  of  an  engineer,  emphasiz¬ 
ing  the  application  potential  of  neural  networks 
and  drawing  comparisons  with  other  tech¬ 
niques  that  have  similar  motivations.  Mathe¬ 
matics  will  be  relied  upon  in  many  of  the 
discussions  to  make  points  as  precise  as  possi¬ 
ble. 


1.  OVERVIEW  OF  PAPER 

This  paper  begins  with  a  review  of  what 
neural  networks  are  and  why  they  are  so 
appealing.  A  typical  neural  network  is  immedi¬ 
ately  introduced  to  illustrate  several  of  the  key 
features,  Then,  the  fundamental  elements  of  a 
neural  network  such  as  input  and  output  pat¬ 
terns,  the  processing  element,  connections,  and 
threshold  operations  are  described,  followed  by 
descriptions  of  neural  network  topologies, 
learning  algorithms,  and  recall  dynamics.  Next, 
a  taxonomy  of  neural  networks  is  presented  that 
uses  two  of  their  key  characteristics:  learning 
and  recall.  Finally,  a  comparison  of  neural  net¬ 
works  and  similar  non-neural  information  pro¬ 
cessing  methods  is  presented. 

2.  WHAT  ARE  NEURAL  NETWORKS 


AND  WHAT  ARE  THEY  GOOD  FOR? 

Neural  networks  are  information  process¬ 
ing  systems.  In  general,  neural  networks  can  be 
thought  of  as  “black  box”  devices  that  accept 
inputs  and  produce  outputs.  Some  of  the  opera¬ 
tions  that  neural  networks  perform  include: 

•  classification  -  an  input  pattern  is  passed  to 
the  network  and  the  network  produces  a 
representative  class  as  output. 

•  pattern  matching  -  an  input  pattern  is  passed 
to  the  network  and  the  network  produces 
the  corresponding  output  pattern. 

•  pattern  completion  -  an  incomplete  pattern 
is  passed  to  the  network  and  the  network 
produces  an  output  pattern  that  has  the 
missing  portions  of  the  input  pattern  filled 
in. 

•  noise  removal  -  a  noise -corrupted  input  pat¬ 
tern  is  presented  to  the  network  and  the  net¬ 
work  removes  some  (or  all)  of  the  noise  and 
produces  a  cleaner  version  of  the  input  pat¬ 
tern  as  output. 

•  optimization  -  an  input  pattern  representing 
the  initial  values  for  a  specific  optimization 
problem  are  presented  to  the  network  and 
the  network  produces  a  set  of  variables  that 
represent  a  solution  to  the  problem. 

•  control  -  an  input  pattern  represents  the  cur¬ 
rent  state  of  a  controller  and  the  desired 
response  for  the  controller  and  the  output  is 
the  proper  command  sequence  that  will  cre¬ 
ate  the  desired  response. 

Neural  networks  consist  of  layers  of  pro¬ 
cessing  elements  and  weighted  connections. 
Each  layer  in  a  neural  network  consists  of  a  col¬ 
lection  of  processing  elements  (PEs).  Each  PE 
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tions,  performs  a  predefined  mathematical 
operation  (typically  a  dot-product  followed  by 
a  threshold),  and  produces  a  single  output 
value. 

Figure  1  illustrates  a  typical  neural  network 
with  three  layers  denoted  Fx,  Fy,  and  F2;.  The 
bottom  layer,  Fx,  accepts  inputs  into  PEs  xj, 
X2,  X3.  A  collection  of  weighted  connections 
(sometimes  called  “weights”  or  “connections”) 
connect  the  Fx  PEs  to  the  Fy  PEs,  The  Fy  PEs, 
yi  and  y2,  are  the  hidden  layer.  Similarly,  the 
Fy  PEs  are  connected  to  the  F2  PEs  which  form 
the  output  layer.  The  weight  names  serve  as 
both  a  label  and  a  value.  As  an  example,  in  Fig¬ 
ure  1  the  connection  from  the  Fx  PE  X|  to  the 
Fy  PE  y2  is  the  connection  weight  W12  (the 


connection  from  xj  to  yo).  By  adjusting  the 
connection  weights,  information  is  stored  in  the 
networic.  The  value  of  the  connection  weights 
are  often  determined  by  a  neural  network  learn¬ 
ing  procedure  (although  sometimes  they  are 
predefined  and  hardwired  into  the  network).By 
performing  the  update  operations  for  each  of 
the  PEs  the  neural  network  recalls  information. 

There  are  two  important  features  illustrated 
by  the  neural  network  shown  in  Figure  1  that 
apply  to  all  neural  networks: 

•  Local  Operations.  Each  PE  acts  indepen¬ 
dently  of  all  others.  A  PE’s  output  relies 
only  on  its  constantly  available  inputs  from 
the  abutting  connections.  The  information 
provided  by  the  adjoining  connections  is  all 
a  PE  needs  to  process.  Information  from 
other  PEs  ^vhere  an  explicit  connection 
does  not  exist  is  not  necessary, 

•  Distributed  Representation.  The  large  num¬ 
ber  of  connections  provides  a  large  amount 
of  redundancy  and  facilitates  a  distributed 
representation.  A  large  number  of  connec¬ 
tions  must  be  eliminated  for  a  significant 
amount  of  information  to  be  destroyed. 

The  first  feature  allows  neural  networks  to 
operate  efficiently  in  parallel.  The  last  feature 
provides  neural  networks  with  inherent  fault- 
tolerance  and  generalization  qualities  that  are 
very  difficult  to  attain  from  typical  computing 
systems.  In  addition  to  these  features,  neural 
networks  can  learn  arbitrary  nonlinear  map¬ 
pings  given  the  proper  topology,  nonlinear  pro¬ 
cessing  elements  from  nonlinear  threshold 
operations,  and  appropriate  learning  rules.  The 
ability  to  learn  nonlinear  mappings  simply  by 
presenting  instances  of  input  and  output  pat¬ 
terns  is  a  powerful  attribute  shared  by  few  sys¬ 
tems. 

There  are  three  primary  situations  where 
neural  networks  are  useful: 

•  Situations  where  only  a  few  decisions  are 
required  from  a  massive  amount  of  data 
(e.g.  speech  and  image  processing). 
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•  Situations  where  nonlinear  mappings  must 
be  automatically  acquired  (e.g.  loan  evalu¬ 
ations  and  robotic  control). 

•  Situations  where  a  near-optimal  solution  to 
a  combinatorial  optimization  problem  is 
required  very  quickly  (e.g,  airline  schedul¬ 
ing  and  telecommunication  message  rout¬ 
ine). 

To  summarize,  the  foundations  of  neural 
networks  consist  of  an  understanding  of  the 
nomenclature  and  a  firm  comprehension  of  the 
rudimentary  mathematical  concepts  used  to 
describe  and  analyze  neural  network  process¬ 
ing.  In  a  broad  sense,  neural  networks  consist 
of  three  principle  elements: 

•  Topology.  A  neural  network’s  organization 
into  interconnected  layers. 

•  Learning.  The  adjustment  of  weights  to 
store  information. 

•  Recall.  Retrieving  information  stored  in  the 
weights. 

Sections  4, 5,  and  6  describes  each  of  these 
elements,  respectively.  Prior  to  these  discus¬ 
sions,  Section  3  will  address  the  fundamental 
components  used  to  create  a  neural  network: 
connections,  processing  elements,  and  thresh¬ 
old  functions. 


3.  DISSECTING  NEURAL  NETWORKS 

A  convenient  neural  network  analogy  is  the 
directed  graph,  where  the  edges  and  nodes  cor¬ 
respond  to  weights  and  PEs,  respectively.  In 
addition  to  connections  and  processing  ele¬ 
ments,  threshold  functions  and  input/output 
patterns  are  also  basic  elements  in  the  design, 
implementation  and  use  of  neural  networks. 
After  a  description  of  the  terminology  used  to 
describe  neural  networks,  each  of  these  ele¬ 
ments  will  be  examined  in  turn. 

3.1.  Terminology 

Unfortunately,  neural  network  terminology 
remains  varied,  with  a  standard  yet  to  be 
adopted  (although  there  is  an  effort  to  create 
one,  cf.  Eberhart,  1990).  To  illustrate  some  of 
the  terminology  introduced  here,  please  refer  to 
Figure  2. 

Input  and  output  vectors  (patterns)  are 
denoted  by  subscripted  capital  letters  from  the 
beginning  of  the  alphabet.  The  input  m  patterns 
arc  denoted  as  Aj^  =  (aj^j,  3^2, ....  aju,);  k  =  1, 2, 
...,  m,  and  the  output  patterns  as  =  (bi^i,  bic2, 
....  fiiq)).  ~  1.2, ...,  m. 

The  PEs  in  a  layer  will  be  denoted  by  the 
same  subscripted  variable.  The  collection  of 
PEs  in  a  layer  form  a  vector  and  these  vectors 
will  be  denoted  by  capital  letters  from  the  end 
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of  the  alphabet.  In  most  cases  three  layers  of 
PEs  will  suffice.  The  input  layer  of  PEs  is 
denoted  as  Fx  =  (xj,  X2, ....  x^),  where  each  Xj 
receives  input  fi-om  the  corresponding  input 
pattern  component  ay.  The  next  layer  of  PEs 
will  be  the  Fy  PEs,  then  the  F2  PEs  (if  either 
layer  is  necessary).  The  dimensionality  of  these 
layers  depends  on  its  use.  Using  the  network  in 
Figure  2  as  an  example,  the  second  layer  of  the 
network  is  the  output  layer,  hence  the  number 
of  Fy  PEs  must  match  the  dimensionality  of 
output  patterns.  In  this  instance,  the  output 
layer  is  denoted  as  Fy  =  (yj,  y2, ...,  yp),  where 
each  yj  is  correlated  with  the  j  ’th  element  of  B^. 

Connection  weights  are  stored  in  weight 
matrices.  Weight  matrices  will  be  denoted  by 
capital  letters  toward  the  middle  of  the  alpha¬ 
bet,  such  as  U,  V,  and  W.  Referring  to  the  exam¬ 
ple  in  Figure  2,  this  two  layer  neural  network 
requires  one  weight  matrix  to  fully  connect  the 
layer  of  n  Fx  PEs  to  the  layer  of  p  Fy  PEs.  The 
matrix  shown  in  Figure  2  describes  the  full  set 
of  connection  weights  between  Fx  and  Fy, 
where  the  weight  wjj  is  the  connection  weight 
from  the  i’th  Fx  PE,  Xj,  to  the  j’th  Fy  PE,  yj. 

3.2.  Input  and  Output  Patterns 

Neural  networks  can  not  operate  unless 
they  have  data.  Some  neural  networks  require 
only  single  patterns  and  others  require  pattern 
pairs.  Note  that  the  dimensionality  of  the  input 
pattern  is  not  necessarily  the  same  as  the  output 
pattern.  When  a  network  only  works  with  sin¬ 
gle  patterns,  it  is  an  autoassociative  network. 
When  a  network  works  with  pattern  pairs  it  is 
heteroassociative. 

One  of  the  key  issues  when  applying  neural 
networks  is  determining  what  the  patterns 
should  represent.  For  example,  in  speech  rec¬ 
ognition  there  are  many  different  types  of  fea¬ 
tures  that  can  be  employed  (Lippmann,  1989), 
including;  linear  predictive  coding  coefficients, 
Fourier  spectra,  histograms  of  threshold  cross¬ 
ings,  cross-correlation  values.  The  proper 
selection  and  representation  of  these  features 


can  greatly  affect  the  performance  of  the  net¬ 
work. 

In  some  instances  the  representation  of  the 
features  as  a  pattern  vector  is  constrained  by  the 
type  of  processing  the  neural  network  can  per¬ 
form.  Some  networks  can  only  process  binary 
data,  such  as  the  Hopfield  network  (Hopfield, 
1982;  Amari,  1972),  Binary  Adaptive  Reso¬ 
nance  Theory  (Carpenter  &  Grossberg,  1987a), 
and  the  Brain-  State-in-a-Box  (Anderson,  et  al., 
1977).  Others  can  process  real-valued  data  such 
as  backpropagation  (Werbos,  1974;  Parker, 
1982;  Rumelhart,  Hinton,  &  Williams,  1986) 
and  Learning  Vector  Quantization  (Kohonen, 
1984).  Creating  the  best  possible  set  of  features 
and  properly  representing  those  features  is  the 
first  step  toward  success  in  any  neural  network 
application  (Anderson,  1990). 

3.3.  Connections 

A  neural  network  is  equivalent  to  a  directed 
graph  (digraph).  A  digraph  has  edges  (connec¬ 
tions)  between  nodes  (PEs)  that  allow  informa¬ 
tion  to  flow  in  only  one  direction  (the  direction 
denoted  by  the  arrow).  Information  flows 
through  the  digraph  along  the  edges  and  is  col¬ 
lected  at  the  nodes.  Within  the  digraph  repre¬ 
sentation,  connections  determine  the  direction 
of  information  flow.  As  an  example,  in  Figure  2 
the  information  flows  from  the  Fx  layer 
through  the  connections,  W,  to  the  Fy  layer. 
Neural  networks  extend  the  digraph  representa¬ 
tion  to  include  a  weight  with  each  edge  (con¬ 
nection)  that  modulates  the  amount  of  output 
signal  passed  from  one  node  (PE)  down  the 
connection  to  the  adjacent  node.  For  simplicity, 
the  dual  role  of  connections  will  be  employed. 
A  connection  both  defines  the  information  flow 
through  the  network  and  it  modulates  the 
amount  of  information  passing  between  to  PEs. 

The  connection  weights  are  adjusted  during 
a  learning  process  that  captures  information. 
Connection  weights  that  are  positive  valued  are 
excitatory  connections.  Those  that  with  nega¬ 
tive  values  are  inhibitory  connections.  A  con- 
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nection  weight  that  has  a  zero  value  is  the  same 
as  not  having  a  connection  present.  By  only 
allowing  a  subset  of  all  the  possible  connec¬ 
tions  to  have  non-zero  values,  sparse  connec¬ 
tivity  between  PEs  can  be  simulated. 

It  is  often  desirable  for  a  PE  to  have  an 
internal  bias  value  (threshold  value).  Panel  (a) 
of  Figure  3  shows  the  PE  yj  with  three  connec¬ 
tions  from  Fx  {wjj,  W2j,  W3j},  and  a  bias  val¬ 
ue, 0j.  It  is  convenient  to  consider  this  bias  value 
as  an  extra  connection,  wqj,  emanating  from  the 
Fx  PE  xq,  with  the  added  constraint  that  xq  is 
always  equal  to  1  as  shown  in  panel  (b).  This 
mathematically  equivalent  representation  sim- 
plities  many  discussions.  Throughout  the  paper 
this  method  of  representing  the  bias  (threshold) 
values  will  be  employed. 

3.4.  Processing  Elements 

The  processing  element  (PE)  is  the  portion 
of  the  neural  network  where  all  the  computing 
is  performed.  Figure  3  illustrates  the  most  com¬ 
mon  type  of  PE.  A  PF  can  have  one  input  con¬ 
nection,  as  is  the  case  when  the  PE  is  an  input 
layer  PE  and  it  receives  only  one  value  from  the 
corresponding  component  of  the  input  pattern, 
or  it  can  have  several  weighted  connections,  as 
is  the  case  of  the  Fy  PEs  shown  in  Figure  2 
where  there  is  a  connection  from  every  Fx  PE 
to  each  Fy  PE.  Each  PE  collects  the  informa¬ 
tion  that  has  been  sent  down  its  abutting  con¬ 
nections  and  produces  a  single  output  value. 
There  are  two  important  qualities  that  a  PE 
must  possess: 

•  Local  Operations.  Described  earlier  in  §1. 

•  Single  Output  Value.  Each  PE  produces  a 
single  ouq)ut  value  that  is  propagated 
through  the  connections  from  the  emitting 
PE  to  other  receiving  PEs  or  it  will  be  out¬ 
put  from  the  network. 

These  two  qualities  allow  neural  networks  to 
operate  in  parallel.  The  value  of  the  PE  and  its 
label  use  the  same  symbol.  As  an  example,  the 
output  PE  label  yj  in  Figure  3  represents  both 
the  PEs  placement  in  the  network  and  its  value. 


There  are  several  mechanisms  for  comput¬ 
ing  the  ouq)ut  of  a  processing  element.  The  out¬ 
put  value  of  the  PE  shown  in  Figure  3(b),  yj,  is 
a  function  of  the  outputs  of  the  preceding  layer, 
Fx  =  X  =  (xi,  X2, ...,  Xn)  and  the  weights  from 

Fx  to  yj,  Wj  =  (wij,  W2j . Wnj).  Mathemati- 

cdly,  the  output  of  yj  is  a  function  of  its  inputs 
and  its  weights, 

y.^F{X,Wp.  (1) 

3.4.1.  Linear  Combination 

The  most  common  computation  performed 
by  a  PE  is  a  linear  combination  (dot-product)  of 
the  input  values,  X,  with  the  abutting  connec¬ 
tion  weights,  Wj,  followed  by  a  threshold  oper¬ 
ation  (cf.  Simpson,  1990a;  Hecht-Nielsen, 
1990;  Maren,  Harston  &  Pap,  1990).  Using  the 
PE  in  Figure  3(b)  as  an  example,  the  output  yj 
is  computed  using  the  equation 

(2) 

\  =  0  J 

where  Wj  =  (wy,  W2j, ...,  Wnj)  and  f  is  one  of  the 
threshold  functions  described  in  §3.4.  of  this 
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chapter.  The  dot  product  update  has  a  very 
appealing  quality  that  is  intrinsic  to  its  compu¬ 
tation.  Using  the  relationship  Ajc^Wj  = 
cos(Aij,Wj)/IIAitll  llWjll,  it  is  seen  that  the  larger 
tliC  dot  product  (assuming  fixed  length  Aj^  and 
Wj)  the  more  similar  the  two  vectors  are. 
Hence,  the  dot  product  can  be  viewed  as  a  sim¬ 
ilarity  measure. 

3.4.2.  Mean- Variance  Connections 

In  some  instances  PEs  will  have  two  con¬ 
nections  interconnecting  PEs  instead  of  just 
one  as  shown  in  Figure  4.  One  use  of  these  dual 
connections  is  to  allow  one  set  of  the  abutting 
connections  represent  the  mean  of  a  class  and 
the  other  the  variance  of  the  class  (Lee  &  Kil, 
1989;  Robinson,  Niranjan,  &  Fallside,  1988). 
In  this  case,  the  output  value  of  the  PE  depends 
on  the  inputs  and  both  sets  of  connections,  i.e. 
yj  =  F(X,Vj,Wj),  where  the  mean  connections 
are  represented  by  Wj  =  (wy,  W2j, ....  Wnj)  and 
the  variance  connections  V;  =  (vjj,  V2j, ....  v^j) 
for  the  PE  yj.  Using  this  scheme,  the  output  of 
yj  is  calculating  the  difference  between  the 
input,  X,  and  the  mean,  Wj,  divided  by  the  vari¬ 
ance,  Vj,  squaring  the  resulting  quantity,  and 
passing  this  value  through  a  Gaussian  threshold 
function  to  produce  the  final  output  value  as 
follows 


(3) 


where  the  Gaussian  threshold  function  is 


six)  =  expi^)  (4) 

The  Gaussian  threshold  function  is  described  in 
greater  detail  in  §3.5.5. 

3.4.3.  Min-Max  Connections 

Another  less  common  use  of  dual  connec¬ 
tions  is  to  assign  one  of  the  abutting  vectors, 
say  Vj,  to  become  the  minimum  bound  for  the 
class  and  the  other  vector,  Wj,  to  becomes  the 
maximum  bound  for  the  same  class.  By  mea¬ 


suring  the  amount  of  the  input  pattern  that  falls 
within  the  bounds,  a  min-max  activation  value 
is  produced  (Simpson,  1990b).  Figure  5  illus¬ 
trates  this  notion  using  a  graph  representation 
for  the  min  and  the  ma:.  points.  The  ordinate  of 
the  graph  rep  esents  the  value  of  each  element 
of  the  min  and  max  vectors  and  the  abscissa  of 
the  graph  represents  the  dimensionality  of  the 
classification  space.  The  input  pattern,  X,  is 
compared  with  the  bounds  of  the  class.  The 
amount  of  disagreement  between  the  classes 
bounds,  \  j  and  Wj,  and  input  pattern,  X,  is 
shown  the  shaded  regions.  The  measure  of 
these  shaded  regions  produces  an  activation 
value  y 

A  fuzzy  set.  A,  is  defined  as  a  set  of  ordered 
pairs,  A  =  {x,  m;^(x)).  A  direct  analogy  with 
fuzzy  sets  is  found  when  the  min-max  class  is 
the  collection  of  points  defining  some  set  and 
the  classification  function  is  the  membership 
function.  When  cast  in  this  framework,  each 
class  in  a  fuzzy  min-max  network  is  actually  a 
fuzzy  set.  The  classification  value  produced 
from  the  fuzzy  min-max  PEs  represents  the 
degree  to  which  an  input  pattern  (object)  fits 
within  each  of  the  classes  (fuzzy  sets).  Refer¬ 
ring  once  again  to  Figure  5  and  utilizing  this 
fuzzy  logic  scheme,  the  max  bound,  Wj,  is  the 
maximum  point  allowed  in  class  j  and  the  min 
bound,  Vj,  is  the  minimum  point  allowed  in 
class  j.  Measuring  the  degree  to  which  X  falls 
between  Vj  and  Wj  can  be  done  by  measuring 
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j.  Rescaling  the  n-dimensional  space  to  lie 
within  the  unit  cube  allows  the  use  of  the  fuzzy 
supersethood  and  subsethood  measures  to  pro¬ 
duce  classification  values.  (Kosko,  1986a).  The 
activation  value  of  (the  degree  to  which  X 
belongs  to  the  class  j)  is  defined  as  the  degree 
to  which  X  is  a  superset  of  Wj  times  the  degree 
to  which  Vj  is  a  subset  of  X,  yielding  the  output 
value 

yj  =  ( 1  -  supersethood  (X,  H^)) 

X  ( 1  -  subsethood  (X,  Vp)  (5) 

It  is  easy  to  show  that  yj  is  bound  to  the  closed 
interval  from  0  to  1.  When  yj  =  1,  X  lies  com¬ 
pletely  within  the  min-max  bounds.  When  yj  = 
0,  X  falls  completely  outside  of  the  min-  max 
bounds.  When  0  <  y:  <  1,  the  value  describes 
the  degree  to  which  X  is  contained  by  the  min- 
max  bounds. 

3.5.  Threshold  Functions 

Threshold  functions,  also  referred  to  as 
activation  functions,  squashing  functions,  or 
signal  functions,  map  a  PE’s  (possibly)  infinite 
domain  to  a  prespecified  range.  Although  the 
number  of  threshold  functions  possible  is  quite 
varied,  there  are  five  that  are  regularly 
employed  by  the  majority  of  neural  networks: 
(1)  linear,  (2)  step,  (3)  ramp,  (4)  sigmoid,  and 
(5)  Gaussian.  Mth  the  exception  of  the  linear 
threshold  function,  all  of  these  introduce  a  non¬ 


linearity  in  the  network  dynamics  by  bounding 
a  PE’s  output  values  to  a  fixed  range. 

3  J.l.  Linear  Threshold  Function 

The  linear  threshold  function  (see  Figme 
6(a)),  produces  a  linearly  modulated  output 
from  the  input  x  as  described  by  the  equation 

fix)  =  ax  (6) 

where  x  ranges  over  the  real  numbers  and  a  is 
a  positive  scalar.  If  a  =  1 ,  it  is  equivalent  to 
removing  the  threshold  function  completely. 

35.2.  Step  Threshold  Function 

The  step  threshold  function,  (see  Figure 
6(b)),  produces  only  two  values,  |3  and  5.  If  the 
input  to  the  threshold  function,  x,  equals  or 
exceeds  the  threshold  value,  6,  then  the  step 
threshold  function  produces  the  value  |3,  other¬ 
wise  it  produces  the  value  -8,  where  |3  and  5  are 
positive  scalars.  Mathematically  this  function 
is  described  as 


fix)  =  ( 


P  if  (;c^e) 
-8  if  (x<e) 


(7) 


Typically  the  step  threshold  function  produces 
a  binary  value  in  response  to  the  sign  of  the 
input,  emitting  +1  if  x  is  positive  and  0  if  it  is 
not  By  making  the  assignments  p=l,  8=0,  and 
0=0,  the  step  threshold  function  becomes  the 
binary  step  function 


.1  if  U^O) 
0  otherwise 


(8) 


which  is  common  to  neural  networks  such  as 
the  Hopfield  neural  network  (Amari,  1972; 
Hopfield,  1982)  and  the  Bidirectional  Associa¬ 
tive  Memory  (Kosko,  1988).  One  small  varia¬ 
tion  of  equation  (8)  is  the  bipolar  threshold 
function 


1  if  ix>0) 

IN  .k  • 

(-1)  otherwise 


(9) 


which  replaces  the  0  output  value  with  a  - 1 .  In 
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punish-reward  systems  such  as  the  Associative 
Reward-Penalty  (Barto,  1985),  the  negative 
value  is  used  to  ensure  changes,  where  a  0  will 
not. 


3.5.3.  Ramp  Threshold  Function 

The  ramp  threshold  function,  (see  Figure 
6(c)),  is  a  combination  of  the  linear  and  step 
threshold  functions.  The  ramp  threshold  func¬ 
tion  places  an  upper  and  lower  bound  on  the 
values  that  the  threshold  function  produces  and 
allows  a  linear  response  between  the  bounds. 
These  saturation  points  are  symmetric  around 
the  origin  and  are  discontinuous  at  the  points  of 
saturation.  The  ramp  threshold  function  is 
defined  as 


.  Y  if  U^Y) 

/(x)  =  I  if  (W  <Y)  (10) 

l-y  if  (x^-y) 

where  yis  the  saturation  value  for  the  function 
and  the  points  x  =  y  and  x  =  -y  arc  where  the  dis¬ 
continuities  in  f  exist. 

3  J.4.  Sigmoid  Threshold  Function 

The  sigmoid  threshold  function,  (see  Figure 
6(d)),  is  a  continuous  version  of  the  ramp 
threshold  function.  The  sigmoid  (S-shaped) 
function  is  a  bounded,  monotonic,  non- 
decreasing  function  that  provides  a  graded, 
nonlinear  response  within  a  prespecified  range. 

The  most  common  sigmoid  function  is  the 
logistic  function 


Ax)  = 


1 


1+e 


-OCX 


(11) 


where  a>0  (usually  a  =  1),  which  provides  an 
output  value  from  0  to  1.  This  function  is  famil¬ 
iar  to  statistics  (as  the  Gaussian  distribution 
function),  chemistry  (describing  catalytic  reac¬ 
tions),  and  sociology  (describing  human  popu¬ 
lation  growth).  Note  that  a  relationship 
between  equation  (11)  and  equation  (8)  exists. 
When  a  =  oo  in  equation  (11),  the  slope  of  the 
sigmoid  function  between  0  and  1  becomes 
infinitely  steep  and,  in  effect,  becomes  the  step 
function  described  by  equation  (8). 


Two  alternatives  to  the  logistic  sigmoid 
function  are  the  hyperbolic  tangent 
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fix)  =  taiih(jr)  (12) 

which  ranges  from  -1  to  1,  and  the  augmented 
ratio  of  squares 


Kx)  = 


'[x^/il+x^)]  if  (x>0) 
<  0  otherwise 


which  ranges  from  0  to  1. 


3.5.5.  Gaussian  Threshold  Function 


(13) 


The  Gaussian  threshold  function,  (sec  Fig¬ 
ure  6(e)),  is  a  radial  function  (symmetric  about 
the  origin)  that  requires  a  variance  value,  v  >  0, 
to  shape  the  Gaussian  function.  In  some  net¬ 
works  the  Gaussian  function  is  used  in  conjunc¬ 
tion  with  a  dual  set  of  connections  as  described 
earlier  by  equation  (3)  and  in  other  instances 
(Specht,  1990)  the  variance  is  predefined.  In  the 
latter  instance,  the  threshold  function  is 


fix)  =  exp(-~)  (14) 

where  x  is  the  mean  and  v  is  the  predefined 
variance. 


4.  NEURAL  NETWORK  TOPOLOGIES 


the  input  pattern  and  the  layer  of  Fy  PEs  all 
receive  their  inputs  from  the  Fx  PEs;  and  (2) 
the  PEs  in  each  layer  utilize  the  same  type  of 
update  dynamics,  eg.  all  the  PEs  will  use  the 
same  type  of  connections  and  the  same  type  of 
threshold  function. 

4.1.2.  Intralayer  vs.  Interlayer  Connections 

There  are  two  types  of  connections  that  a 
neural  network  employs:  intralayer  connec¬ 
tions  and  interlayer  connections.  Intralayer 
connections  are  connections  between  PEs  in 
the  same  layer.  Interlayer  connections  are  con¬ 
nections  between  PEs  in  different  layers.  It  is 
possible  to  have  neural  networks  that  consist  of 
one,  or  both,  types  of  connections. 

4.1.3.  Feedforward  vs.  Feedback  Networks 

When  a  neural  network  has  connections 
that  feed  information  in  only  one  direction, 
from  input  to  output,  without  any  feedback 
pathways  in  the  network,  it  is  a  feedforward 
neural  network.  The  network  is  a  feedback  net¬ 
work  if  the  network  has  any  feedback  paths, 
where  feedback  is  defined  as  any  path  through 
the  network  that  would  allow  the  same  PE  to  be 
visited  twice. 


The  building  blocks  for  neural  networks  are 
in  place.  Neural  network  topologies  now 
evolve  from  the  patterns,  PEs,  connections,  and 
threshold  functions  described  in  §3.  Neural  net¬ 
works  consist  of  layer(s)  of  PEs  interconnected 
by  weighted  connections.  The  arrangement  of 
the  PEs,  connections  and  patterns  into  a  neural 
network  is  referred  to  as  a  topology.  After  intro¬ 
ducing  some  terminology  six  common  neural 
network  topologies  will  be  described. 

4.1.  Terminology 

4.1.1.  Layers 

Neural  networks  are  organized  into  layers 
of  PEs,  PEs  within  a  layer  arc  similar  in  two 
respects:  (1)  the  connections  that  feed  the  layer 
of  PEs  is  from  the  same  source,  eg.  the  Fx  layer 
of  PEs  in  Figure  2  all  receive  their  inputs  from 


4.2.  Instars,  Outstars  &  the  Adaline 

The  two  simplest  neural  networks  are  the 
instar  and  the  outstar  (Grossberg,  1982).  The 
instar  (see  Figure  7(a)),  is  the  minimal  pattern 
encoding  network.  A  simple  example  of  an 
encoding  procedure  for  the  instar  would  take 
the  pattern,  A^  =  (a^i,  ...,  ajo,),  normalize  it, 

and  use  the  values  as  the  weights,  Wj  =  (wjj, 
W2j, ...,  w„j),  as  shown  by  the  equation 


for  all  i=  1,2,...,  n. 

The  dual  of  the  instar  is  the  outstar,  (see 
Figure  7(b)).  The  outstar  is  the  minimal  pattern 
recaP  neural  network.  An  output  pattern  is  gen- 


a 


ki 


i  =  1 


(15) 
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z/  =  (16) 

for  all  i  =  1, 2, p,  where  the  weights  are 
determined  using  equation  (15)  or  one  of  the 
learning  algorithms  described  in  §5, 

The  AD  ALINE,  ADAptive  Linear  NEuron, 
(Widrow  &  Hoff,  1960)  has  the  same  topology 
as  the  instar  (see  Figure  7(a)),  but  the  weights, 
Vj,  are  adjusted  using  the  Least-Mean-Square 
(LMS)  algorithm  (sec  §5.7.1.),  In  the  frame¬ 
work  of  adaptive  signal  processing,  a  similar 
topology  with  the  same  functionality  is  referred 
to  a  finite  impulse  response  (FIR)  filter  (Wid- 


row  &  Steams,  1985).  Applications  of  the  FIR 
filter  to  noise  cancellation,  echo  cancellation, 
adaptive  antennas,  and  control  are  numerous 
(VWdrow  &  )Afintcr,  1988). 

4  Single>layer  NetworRs:  Autoassociation, 
Optinuzation,  and  Contrast  Enhancement 

Beyond  the  instar/outstar  neural  networks 
are  the  single  layer  intraconnected  neural  net¬ 
works.  Figure  8  shows  the  topology  of  a  one- 
layer  neural  network  which  consists  of  n  Fx 
PEs.  The  connections  Srom  each  Fx  PE  to  every 
other  Fx  PE  and  itself,  yielding  a  connection 
matrix  with  entries.  The  single-layer  neural 
network  accepts  an  n-dimensional  input  pattern 
in  one  of  three  ways: 

•  PE  Initialization  Only.  The  input  pattern  is 
used  to  initialize  the  Fx  PEs  and  the  input 
pattern  does  not  influence  the  processing 
thereafter. 

•  PE  Initialization  and  Constant  Bias.  The 
input  pattern  is  used  to  initialize  the  Fx  PEs 
and  the  input  remains  as  a  constant  valued 
input  bias  throughout  processing. 

•  Constant  Bias  Only.  The  PEs  are  initialized 
to  all  zeroes  and  the  input  pattern  acts  as  a 
constant  valued  bias  throughout  process¬ 
ing. 

One-layer  neural  networks  are  used  for  pat¬ 
tern  completion,  noise  removal,  optimization, 
and  contrast  enhancement.  The  firet  two  opera¬ 
tions  are  performed  by  autoassociatively 
encoding  patterns  and  typically  using  the  input 
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pattern  for  PE  initialization  only.  The  optimiza¬ 
tion  networks  are  dynamical  systems  that  stabi¬ 
lize  to  a  state  that  represents  a  solution  to  an 
optimization  problem  and  typically  uses  the 
inputs  for  both  PE  initialization  and  as  constant 
biases.  Contrast  enhancement  netwoiks  use  the 
input  patterns  for  PE  initialization  only  and  can 
operate  in  such  a  way  that  eventually  only  one 
PE  remains  active.  Each  of  these  one-layer  neu¬ 
ral  networks  are  described  in  greater  detail  in 
the  following  paragraphs. 

4.3.1.  Pattern  Completion 

Pattern  completion  in  a  single-layer  neural 
network  is  performed  by  presenting  a  partial 
pattern  initially,  and  relying  upon  the  neural 
network  to  complete  the  remaining  portions. 
As  an  example,  assume  a  single  layer  neural 
network  has  stored  images  of  human  faces.  If 
half  of  a  face  is  presented  to  the  neural  netwoiic 
as  the  initial  state  of  the  network,  the  neural 
network  would  complete  the  missing  half  of  the 


face  and  output  a  complete  face. 

43.2.  Noise  Removal 

Noise  removal  is  similar  to  pattern  comple¬ 
tion  in  that  a  complete,  noise-free,  response  is 
desired  from  a  pattern  corrupted  by  noise.  Fun¬ 
damentally  there  is  no  difference  between  noise 
removal  and  pattern  completion.  The  differ¬ 
ence  tends  to  be  entirely  operational.  Using  the 
previous  image  storage  example,  if  a  blurry  or 
splotchy  image  is  presented  to  the  neural  net¬ 
work,  the  output  would  be  a  crisp  clear  image. 
Single-layer  neural  networks  designed  for  pat¬ 
tern  completion  and  noise  cancellation  include 
the  Discrete  Hopfield  network  (Hopfield, 
1982),  the  Brain-State-in-a-Box  (Anderson,  et 
al.,  1977),  and  the  Optimal  Linear  Associative 
Memory  (Kohonen,  1984). 

43.3.  Neural  Optimization 

One  of  the  most  prevalent  uses  of  neural 
netwoiks  is  optimization  (Hopfield  &  Tank, 


Figure  9:  Local  and  Global  Contrast  Enhancement 
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1985;  Tank  &  Hopfield,  1986).  Optimization  is 
a  technique  for  solving  a  problem  by  casting  it 
into  a  mathematical  equation  that,  when  either 
maximized  or  minimized,  solves  a  problem. 
Typical  examples  of  problems  approached 
using  an  optimization  technique  include  sched¬ 
uling,  routing,  and  resource  allocation.  The 
neural  optimization  approach  casts  the  optimi¬ 
zation  problem  into  the  form  of  an  energy  func¬ 
tion  that  describes  the  dynamics  of  a  neural 
system.  If  the  neural  network  dynamics  are 
such  that  the  network  will  always  seek  a  stable 
state  when  the  energy  function  is  at  a  minimum, 
then  the  network  will  automatically  find  a  solu¬ 
tion.  The  inputs  to  the  neural  network  are  the 
initial  state  of  the  neural  network  and  the  final 
PE  values  represent  the  parameters  of  a  solu¬ 
tion. 

4.3.4.  Contrast  Enhancement 

Contrast  enhancement  in  single-layer  neu¬ 
ral  networks  is  achieved  using  on-center/off- 
surround  connection  values,  llie  on-center 
connections  are  positive  self-connections,  i.e. 
wji  =  a  (a  >  0)  for  all  i  =  1, 2, ...,  n,  that  allow 
a  pattern’s  activation  value  to  grow  by  feeding 
back  upon  themselves.  The  off-surround  con¬ 
nections  are  negative  neighbor  connections,  i.e. 
wjj  =  -P  (P  >  0)  for  all  i  not  equal  to  j,  that  com¬ 
pete  with  the  on-center  connections.  The  com¬ 
petition  between  the  positive,  on-center,  and 
the  negative,  off-surround,  activation  values 
are  referred  to  as  competitive  dynamics.  Con¬ 
trast  enhancement  neural  networks  take  one  of 
two  forms:  locally  connected  and  globally  con¬ 
nected.  If  the  connections  between  the  Fx  PEs 
are  only  connected  to  a  few  of  the  neightoring 
PEs  (see  Figure  9(a)),  the  result  is  a  local  com¬ 
petition  that  can  result  in  several  large  activa¬ 
tion  values.  If  the  off-surround  cormections  are 
fully  interconnected  across  the  Fx  layer  (see 
Figure  9(b)),  the  competition  will  yield  a  single 
winner. 


4.4.  Two-layer  Networks:  Heteroassociation 
and  Classification 

Two-layer  neural  networks  consist  of  a 
layer  of  n  Fx  PEs  fully  interconnected  to  a  layer 
of  p  Fy  PEs  as  shown  in  Figure  10.  The  connec¬ 
tions  from  the  Fx  to  Fy  PEs  form  the  n-by-p 
weight  matrix  W  where  wy  represents  the 


Figure  10:  Examples  of  Two-layer 
Neural  Networks 
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(b) 
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weight  for  the  connection  from  i’th  Fx  PE,  Xj, 
to  the  j’th  Fy  PE,  yj.  There  are  three  common 
types  of  two-layer  neural  networks:  feedfor¬ 
ward  pattern  matchers,  feedback  pattern  match¬ 
ers,  and  feedforward  pattern  classifiers. 

4.4.1.  Feedforward  Pattern  Matching 

A  two-layer  feedforward  pattern  matching 
neural  network  maps  the  input  patterns,  to 
the  corresponding  output  patterns,  Bj^,  k  =  1, 2, 
m.  The  network  shown  in  Figure  10(a)  illus¬ 
trates  the  topology  of  this  feedforward  networic. 
The  two-layer  feedforward  neural  network 
accepts  the  input  pattern  and  produces  an 
output  pattern,  Y  =  (yj,  y2, ...,  yp),  that  is  the 
network’s  best  estimate  of  the  proper  output 
given  Ajj  as  the  input.  An  optimal  mapping 
between  the  inputs  and  the  outputs  is  one  that 
produces  the  correct  response  when  Aj^  is 
presented  to  the  network,  k  =  1, 2, ...,  m.  Most 
two-layer  networks  are  concerned  with  finding 
the  optimal  linear  mapping  between  the  pattern 
pairs  (Ajj,Bj|j)  (cf.  Widrow  &  Winter,  1988; 
Kohonen,  1984),  but  there  are  other  two-layer 
feedforward  networks  that  also  work  with  non¬ 
linear  mappings  by  extending  the  input  patterns 
to  include  multiplicative  combinations  of  the 
original  inputs  (Pao,  1989;  Maren,  Harsten  & 
Pap,  1990). 

4.4.2.  Feedback  Pattern  Matching 

A  two-layer  feedback  pattern  matching 
neural  network,  shown  in  Figure  10(b),  accepts 
inputs  from  either  layer  of  the  network,  either 
the  Fx  and  Fy  layers,  and  produces  the  output 
for  the  other  layer  (Kosko,  1988;  Simpson, 
1990). 

4.4.3.  Feedforward  Pattern  Classification 

A  two-layer  pattern  classification  neural  net¬ 
work,  shown  in  Figure  10(c),  maps  an  input 
pattern,  A|j,  to  one  of  p  classes.  By  representing 
each  class  as  a  separate  Fy  PE,  the  pattern  clas¬ 
sification  task  is  then  reduced  to  selecting  the 
Fy  PE  that  best  responds  to  the  input  pattern. 
Most  two-layer  pattern  classification  systems 


utilize  the  competitive  dynamics  of  global  on- 
center/off-surround  connections  to  perform  the 
classification. 

4.5.  Multi-layer  Networks:  Heteroassocia¬ 
tion  and  Function  Approximation 

A  multi-layer  neural  network  has  more  than 
two  layers,  possibly  many  more.  A  general 
description  of  a  multi-layer  neural  network  is 
shown  in  Figure  11,  where  there  is  an  input 
layer  of  PEs,  Fx,  L  hidden  layers  of  Fy  PEs 
(Y 1,  Y2, ...,  Yl),  and  a  final  output  layer,  Fz- 
The  Fy  layers  are  called  hidden  layers  because 
there  are  no  direct  connections  between  the 
input/output  patterns  to  these  PEs,  rather  they 
are  always  accessed  through  another  set  of  PEs 
such  as  the  input  and  output  PEs.  Although  Fig¬ 
ure  11  shows  connections  only  from  one  layer 
to  the  next,  it  is  possible  to  have  connections 
that  skip  over  layers,  that  connect  the  input  PEs 
to  the  output  PEs,  or  that  connect  PEs  together 
within  the  same  layer.  The  added  benefit  of 
these  PEs  is  not  fully  understood,  but  many 
applications  such  as  prediction  and  classifica¬ 
tion  are  employing  these  types  of  topologies. 

Multi-layer  neural  networks  are  used  for 
pattern  classification,  pattern  matching  and 
function  approximation.  By  adding  a  continu¬ 
ously  differentiable  threshold  function,  such  as 
a  Gaussian  or  sigmoid  function,  it  is  possible  to 
learn  practically  any  nonlinear  mapping  to  any 
desired  degree  of  accuracy  (White,  1989).  The 
mechanism  that  allows  such  complex  map¬ 
pings  to  be  acquired  is  not  fully  understood  for 
each  type  of  multi-layer  neural  network,  but  in 
general  the  network  partitions  the  input  space 
into  regions  and  a  mapping  from  the  partitioned 
regions  to  the  next  space  is  performed  by  the 
next  set  of  connections  to  the  next  layer  of  PEs, 
eventually  producing  an  output  response.  This 
capability  allows  some  very  complex  decision 
regions  to  be  performed  for  classification  and 
pattern  matching  problems,  as  well  as  applica¬ 
tions  that  require  function  approximation. 

There  are  several  issues  that  must  be 
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addressed  when  working  with  multi-layer  neu¬ 
ral  networks.  How  many  layers  is  enough  for  a 
given  problem?  How  many  PEs  are  needed  in 
each  hidden  layer?  How  much  data  is  needed  to 
produce  a  sufficient  mapping  from  the  input 
layer  to  the  output  layer?  Some  of  these  issues 
have  been  successfully  dealt  with.  As  an  exam¬ 
ple,  there  have  been  several  researchers  that 
have  proven  that  three  layers  is  sufficient  to 
perform  any  nonlinear  mapping  (with  the 
exception  of  a  few  remote  pathological  cases) 
to  any  desired  degree  of  accuracy  with  only  one 
layer  of  hidden  PEs  (see  White,  1989  for  a 
review  of  this  work).  Although  this  is  a  veiy 
important  result,  it  still  does  not  indicate  what 
the  proper  number  of  hidden  layer  PEs  is,  or  if 
the  same  solution  can  be  obtained  with  more 


layers  but  fewer  hidden  PEs  and  connections 
overall. 


There  are  several  ways  that  multi-layer 
neural  netwoiks  can  have  their  connection 
weights  adjusted  to  learn  mappings.  The  most 
popular  technique  is  the  bacIq)ropagation  algo¬ 
rithm  (Werbos,  1974;  Parker,  1982;  Rumelhart, 
Hinton  &  Williams,  1986)  and  its  many  vari¬ 
ants  (see  Simpson,  1990a  for  a  list).  Other 
multi-layer  networks  include  the  Neocognitron 
(Fukushima,  1988),  the  Probabilistic  Neural 


Network  (Specht,  1990),  the  Boltzmann 
Machine  (Ackley,  Hinton  &  Sejnowski,  1985), 
and  the  Cauchy  Machine  (Szu,  1986). 

4.6.  Randomly  Connected  Networks 

Randomly  connected  neural  networks  are 
networics  that  have  connection  weights  that  are 
randomly  assigned  within  a  specific  range. 
Some  randomly  connected  networks  have 
binary  valued  connections.  Realizing  that,  a 
connection  weight  equal  to  zero  is  equivalent  to 
no  connection  being  present,  binary  valued  ran¬ 
dom  connections  create  sparsely  connected 
networics.  Randomly  connected  networks  are 
used  in  three  different  ways: 

•  Initial  weights  -  The  initial  connection  val¬ 
ues  for  the  network  prior  to  training  are  pre¬ 
set  to  random  values  within  a  predefined 
range.  This  technique  is  used  extensively  in 
error-correction  learning  systems  (see  §5.5 
-  §5.6.  below). 

•  Pattern  preprocessing  -  A  set  of  fixed  ran¬ 
dom  binary  valued  connections  are  placed 
between  the  first  two  layers  of  a  multi-layer 
neural  network  as  a  pattern  preprocessor. 
The  use  of  such  random  connections  can  be 
used  to  increase  the  dimensionality  of  the 
space  that  is  being  used  for  mappings  in  an 
effort  to  improve  the  pattern  mapping  capa¬ 
bility.  This  approach  was  pioneered  with 
die  early  Perceptron  (Rosenblatt,  1962)  and 
has  been  used  recently  in  the  Sparse  Dis¬ 
tributed  Memory  (Kanerva,  1988). 

•  Intelligence  from  randomness  -  Early  stud¬ 
ies  in  neural  networks  spent  a  great  deal  of 
effort  analyzing  randomly  connected 
binary  valued  systems,  llie  model  of  the 
brain  as  a  randomly  connected  network  of 
neurons  prompted  this  research.  These 
fixed  weight,  non-adaptive  systems  have 
been  studied  extensively  by  Amari  (1971) 
and  Rozonoer  (1969). 
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5.  NEURAL  NETWORK  LEARNING 

Perhaps  the  most  appealing  quality  of  neu¬ 
ral  networks  is  their  ability  learn.  Learning,  in 
this  context,  is  defined  as  a  change  in  connec¬ 
tion  weight  values  that  results  in  the  capture  of 
information  that  can  later  be  recalled.  There  are 
several  different  procedures  available  for 
changing  the  values  of  connection  weights. 
After  an  introduction  to  some  terminology, 
eight  different  learning  methods  will  be 
described.  For  continuity  of  discussion,  the 
learning  algorithms  will  be  described  in  point- 
wise  notation  (as  opposed  to  vector  notation). 
In  addition,  the  learning  algorithms  will  be 
described  using  discrete  time  equations  (as 
opposed  to  continuous  time).  The  use  of  dis¬ 
crete-time  equations  makes  them  more  accessi¬ 
ble  to  digital  computer  simulations. 

5.1.  Terminology 

5.1.1.  Supervised  vs.  Unsupervised  Learning 

All  learning  methods  can  be  classified  into 
two  categories,  supervised  learning  and  unsu¬ 
pervised  learning.  Supervised  learning  is  a  pro¬ 
cess  that  incorporates  an  external  teacher  and/ 
or  global  information.  The  supervised  learning 
algorithms  that  will  be  discussed  in  the  follow¬ 
ing  sections  include  error  correction  learning, 
reinforcement  learning,  stochastic  learning, 
and  hardwired  systems.  Examples  of  super¬ 
vised  learning  include;  deciding  when  to  turn 
off  the  learning,  deciding  how  long  and  how 
often  to  present  each  association  for  training, 
and  supplying  performance  (error)  informa¬ 
tion.  Supervised  learning  is  further  classified 
into  two  subcategories;  structural  learning  and 
temporal  learning.  Structiual  learning  is  con¬ 
cerned  with  finding  the  best  possible  input-out¬ 
put  relationship  for  each  individual  pattern  pair. 
Examples  of  structural  learning  include  pattern 
matching  and  pattern  classification.  The  major¬ 
ity  of  the  learning  algorithms  discussed  below 
focus  on  structural  learning.  Temporal  learning 
is  concerned  with  capturing  a  sequence  of  pat¬ 
terns  necessary  to  achieve  some  final  outcome. 


In  temporal  learning  the  current  response  of  the 
network  is  dependant  on  previous  inputs  and 
responses.  In  structural  learning,  there  is  no 
such  dependance.  Examples  of  temporal  learn¬ 
ing  include  prediction  and  control.  The  rein¬ 
forcement  learning  algorithm,  discussed  below 
is  an  example  of  a  temporal  learning  procedure. 

Unsupervised  learning,  also  referred  to  as 
self-organization,  is  a  process  that  incorporates 
no  external  teacher  ar d  relies  upon  only  local 
information  during  tli ;  entire  learning  process. 
Supervised  learning  organizes  presented  data 
and  discovers  its  emergent  collective  proper¬ 
ties.  Examples  of  unsupervised  learning  that 
will  be  discussed  in  the  following  sections 
includes  Hebbian  learning,  principle  compo¬ 
nent  learning,  differential  Hebbian  learning, 
min-max  learning,  and  competitive  learning. 

5.1.2.  Off-line  vs.  On-line  Learning 

Most  learning  techniques  utilize  off-line 
learning.  When  the  entire  pattern  set  is  used  to 
condition  the  connections  prior  to  the  use  of  the 
network,  it  is  called  off-line  learning.  As  an 
example,  the  backpropagation  training  algo¬ 
rithm  (see  §5.7.2.)  is  used  to  adjust  connections 
in  multi-layer  neural  network,  but  it  requires 
thousands  of  cycles  through  all  the  pattern  pairs 
until  the  desired  performance  of  the  network 
has  been  achieved.  Once  the  network  is  per¬ 
forming  adequately,  the  weights  are  frozen  and 
the  resulting  network  is  used  in  recall  mode 
thereafter.  Off-line  learning  systems  have  the 
intrinsic  requirement  that  all  the  patterns  have 
K>  be  resident  for  training.  Such  a  requirement 
does  not  make  it  possible  to  have  new  patt.'‘.ms 
automatically  incorporated  into  the  network  as 
they  occur,  rather  these  new  patterns  must  be 
added  to  the  entire  set  of  patterns  and  a  retrain¬ 
ing  of  the  neural  network  must  be  done  again. 

Not  all  neural  networks  perform  off-line 
learning.  There  are  some  networks  that  can  add 
new  information  “on  the  fly”  non-destructively. 
If  a  new  pattern  needs  to  bt  incorporated  into 
the  network’s  connections,  it  can  be  done 
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immediately  without  any  loss  of  prior  stored 
information.  The  advantage  of  off-line  learning 
networks  is  they  usually  provide  superior  solu¬ 
tions  to  difficult  problems  such  as  nonlinear 
classification,  but  on-line  learning  allows  tiic 
neural  networic  to  learn  in-sim.  A  challenge  in 
the  future  of  neural  network  computing  is  the 
development  of  learning  techniques  that  pro¬ 
vide  high-performance  on-line  learning  with¬ 
out  extreme  costs. 

5.2.  Hebbian  Correlations 

The  simplest  form  of  adjusting  connection 
weight  values  in  a  neural  network  is  based  upon 
the  correlation  of  PE  activation  values.  The 
motivation  for  correlation-based  adjustments 
has  been  attributed  to  Hebb  (1949)  who 
hypothesized  that  the  change  in  a  synapses  effi¬ 
cacy  (its  ability  to  fire,  or  as  we  are  simulating 
it  in  our  neural  networks,  the  connection 
weight)  is  prompted  by  a  neuron’s  ability  to 
produce  an  output  signal.  If  a  neuron.  A,  was 
active,  and  A’s  activity  caused  a  connected  neu¬ 
ron,  B,  to  fire,  then  the  efficacy  of  the  synaptic 
connection  between  A  and  B  should  be 
increased. 

5.2.1.  Unbounded  PE  Values  and  Weights 

This  form  of  learning,  now  commonly 
referred  to  as  Hebbian  learning,  has  been  math¬ 
ematically  characterized  as  the  correlation 
weight  adjustment 


where;  i  =  1 , 2, ...,  n;  j  =  1 , 2 . p;  xj  is  the  value 

of  the  i’th  PE  in  the  Fx  layer  of  a  two  layer  net¬ 
work;  yj  is  the  value  of  the  j’th  Fy  PE;  and  the 
connection  weight  between  the  two  PEs  is  wjj 
In  general,  the  values  of  the  PEs  can  range  over 
the  real  numbers  and  the  weights  are  unbound. 
When  the  PE  values  and  connection  values  are 
unbound,  these  two  layer  neural  networks  are 
amenable  to  linear  systems  theory.  Neural  net¬ 
works  like  the  Linear  Associative  Memory 
(Anderson,  1970;  Kohonen,  1972)  employ  this 


type  of  learning  and  analyze  the  capabilities  of 
these  networks  using  linear  systems  theory  as  a 
guide.  The  number  of  patterns  that  a  network 
trained  using  equation  (17)  with  unbounded 
weights  and  connections  is  limited  to  the 
dimensionality  of  the  input  patterns  (cf.  Simp¬ 
son,  1990a). 

5.2.2.  Bounded  PE  Values  &  Unbounded 
Weights 

Recently,  implementations  that  restrict  the 
values  of  the  PEs  and/or  the  weights  of  equa¬ 
tion  (17)  have  been  employed.  These  networks, 
called  Hopfield  Networks  because  John  Hop- 
field  had  excited  people  about  their  potent!^ 
(Hopfield,  1982),  restrict  the  PE  values  to  either 
binary  {0,1}  or  bipolar  {-1,+1 )  values.  Equa¬ 
tion  (17)  is  used  for  these  types  of  correlations. 

These  discrete-valued  networks  typically 
involve  some  form  of  feedback  recall,  resulting 
in  the  need  to  show  that  every  input  will  pro¬ 
duce  a  stable  response  (ouq)ut).  By  limiting  the 
PE  values  during  processing,  nonlmearities  are 
introduced  in  the  system,  eliminating  some  of 
the  linear  systems  theory  analyses  that  had  pre¬ 
viously  been  performed.  By  adding  feedback 
into  the  recall  process,  a  discrete  valued,  non¬ 
linear,  dynamical  system  is  formed.  The  single 
layer  versions  of  this  learning  rule  are 
described  as  Hopfield  nets  (Hopfield,  1982) 
and  the  two-layer  versions  as  the  Bidirectional 
Associative  Memory  (Kosko,  1988).  Some  of 
the  earlier  analysis  of  these  networks  was  per¬ 
formed  by  Amari  (1972  &  1977)  who  used  the 
theory  of  statistical  neurodynamics  to  show 
these  networks  were  stable.  Later  Hopfield 
(1982)  had  found  an  alternative  method  to 
prove  stability.  Also,  the  number  of  patterns 
that  neural  networks  of  this  form  can  store  is 
limited  (McEleice,  et  al.,  1987). 

5.2.5.  Bounded  PE  Values  and  Weights 

Sometimes  both  the  PE  values  and  the 
weights  are  bounded.  There  are  two  forms  of 
such  systems.  The  first  form  is  simply  a  running 
average  of  the  amount  of  correlation  between 
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two  PEs.  The  equation 


describes  the  average  correlation  during  the 
presentation  of  the  k’th  pattern  pair  (Aj.,  Bjt), 
where:  Aj^  =  Bj^  =  (bi^i, 

....  bi^);  and  k  is  current  pattern  number  and  k 
=  1,2, ...,  m.  The  same  informadon  that  was 
stored  using  equation  (17)  is  stored  using  equa¬ 
tion  (18),  the  connection  weights  are  simply 
bound  to  the  unit-interval  in  the  latter  case. 

The  other  example  of  a  correlation  neural 
network  learning  equation  with  bounded  PE 
values  and  bounded  weights  is  the  sparse 
encoding  equation  defined  as 


new 

W:;  = 


1  if  aiAy  =l 
1  if  =  1 
0  otherwise 


(19) 


This  equation  assigns  a  binary  value  to  a  con¬ 
nection  if  the  PEs  on  each  end  of  the  connection 
have  both  had  the  value  of  1  over  the  course  of 
learning.  The  learning  equation  isequivalent  to 
performing  the  logic  operation 


(20) 

where  n  and  U  are  the  intersection  and  union 
operations,  respectively. 

Neural  networks  that  have  utilized  this 
form  of  learning  include  the  Leammatrix 
(Steinbuch  &  Piske,  1963)  and  the  )^llshaw 
Associative  Memory  (Willshaw,  1980).  This 
learning  equation  has  a  great  deal  of  potential. 
By  sparsely  encoding  information  in  a  binary 
vector  (say  for  example  only  32  components 
out  of  1  n^on  were  set  to  1,  the  others  were 
set  to  0),  it  is  possible  to  store  a  tremendous 
amount  of  information  in  the  network.  The 
problem  lies  in  creating  the  code  necessary  to 
perform  such  dense  storage  (cf.  Hecht-Nielscn, 
1990). 

5.3.  Principle  Component  Learning 


There  are  some  neural  networks  that  have 
learning  algorithms  designed  to  produce,  as  a 
set  of  weights,  the  principle  components  of  the 
input  data  patterns.  The  principle  components 
of  a  set  of  data  are  found  by  forming  Ae  cova¬ 
riance  (or  correlation)  matrix  of  a  set  of  pat¬ 
terns  and  then  finding  the  minimal  set  of 
orthogonal  vectors  that  span  the  space  of  the 
covariance  matrix.  Once  the  basis  set  has  been 
found,  it  is  possible  to  reconstruct  any  vector  in 
the  space  with  a  linear  combination  of  the  basis 
vectors.  The  value  of  each  scalar  in  the  linear 
combination  represents  the  “importance”  of 
that  basis  vectors  (Lawley  &  Newell,  1963). 
It  is  possible  to  think  of  the  basis  vectors  as  fea¬ 
ture  vectors  and  the  combination  of  these  fea¬ 
ture  vectors  is  used  to  construct  patterns. 
Hence,  the  purpose  of  a  principle  component 
network  is  to  decompose  an  input  pattern  into 
values  the  represent  the  relative  importance  of 
the  features  underlying  the  patterns. 

rhe  first  work  with  principle  component 
learning  was  done  by  Oja  (1982).  Oja  reasoned 
that  Hebbian  learning  with  a  feedback  term  that 
automatically  constrained  the  weights  could 
extract  the  principle  components  from  the  input 
data.  The  equation  Oja  uses  is 

where:  ^  is  the  i’th  component  of  the  k’th 
input  pattern  A^,  i  =  1, 2, ...,  n;  bj^j  is  the  j’th 
component  of  the  k’th  output  pattern  B^,  j  =  1, 
2, ...,  p;  k  =  1, 2, ...,  m;  and  a  and  P  are  positive 
constants. 

A  variant  of  the  work  by  Oja  has  been 
developed  by  Sanger  (1989)  and  is  described 
by  the  equation 


+yk{^kihri>kj'kyh^jM^'> 

V  h=l  ' 

where  the  variables  are  similar  to  those  of  equa¬ 
tion  (21)  with  the  exception  of  the  non-zero, 
time-decreasing  learning  parameter  7^.  Equa¬ 
tions  (21)  and  (22)  are  very  similar,  the  key  dif- 
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ference  is  equation  (22)  includes  more 
infoimadon  in  the  feedback  term  and  uses  a 
decaying  learning  rate.  There  have  been  many 
analyses  and  applications  of  principle  compo¬ 
nent  networks.  For  a  review  of  this  work,  see 
Oja  (1989). 

5.4.  Differential  Hebbian  Learning 

Hebbian  learning  has  been  extended  to  cap¬ 
ture  the  temporal  changes  that  occur  in  pattern 
sequences.  This  learning  law,  entitled  Differen¬ 
tial  Hebbian  Learning,  has  been  independently 
derived  by  Klopf  (1986)  in  the  discrete  time 
form  and  by  Kosko  (1986b)  in  the  continuous 
time  form.  The  general  form,  some  variants, 
and  some  similar  learning  laws  are  outlined  in 
the  following  sections.  There  are  several  other 
combinations  that  have  been  explored  beyond 
those  that  are  presented  in  this  section.  A  more 
thorough  examination  of  these  Hebbian  learn¬ 
ing  niles  and  others  can  be  found  in  Barto 
(1984)  and  Tesauro  (1986). 

5.4.1.  Basic  Differential  Hebbian  Learning 

Differential  Hebbian  Learning  correlates 
the  changes  in  PE  activation  values  with  the 
equation 

+  1)  =  +  Ax,(r  -  l)Ay^<r)  (23) 

where:  Axi(t)  =  Xi(t)  -  Xi(t-l)  is  the  amount  of 
change  in  the  i’th  Fx  PE  at  time  t;  and  Ayj(t-1) 
=  yj(t-l)  -  yj(t-2)  is  the  amount  of  change  in  the 
j’^PyPEattimet  1. 

5.4.2.  Drive-Reinforcement  Learning 

Klopf  (1986)  uses  the  more  general  case  of 
this  equation  that  captures  changes  in  Fx  PEs 
over  that  last  k  time  steps  and  modulates  each 
change  by  the  corresponding  weight  value  for 
the  connection.  Klopf ’s  equation  is 

Wi^(t-»-l)  =  Wiy(r)-t-Ay^. 

It 

X  X  -  ^)| (24) 

M=1 

where:  a(t-h)  is  a  decreasing  function  of  time 


that  regulates  the  amount  of  change;  and  Wjj(t) 
is  the  connection  value  from  the  Xj  to  yj  at  time 
L  Klopf  refers  to  the  pre-synaptic  changes, 
Axi(t-h),  h  =  1, 2, ....  k,  as  drives  and  the  post- 
synaptic  change,  Ayj(t),  as  the  reinforcement, 
hence  the  name  drive-reinforcement  learning. 

5.4.3.  Covariance  Correlation 

Sejnowski  (1977)  has  proposed  the  covari¬ 
ance  correlation  of  PE  activation  values  in  the 
equation 

“r  =  +  (f’v-7y)l  (25) 

where  the  bracketed  terms  represent  the  covari¬ 
ance,  the  difference  between  the  expected 
(average)  value  of  the  PE  activation  values  and 
the  input  and  output  pattern  values.  The  param¬ 
eter  0  <  p.  <  1  is  the  learning  rate.  The  overbar 
on  the  PE  values  represents  the  average  value 
of  the  PE. 

Sutton  &  Barto  (1981)  have  proposed  a  similar 
type  of  covariance  learning  rule,  suggesting  the 
correlation  of  the  expected  value  of  xi  with  the 
variance  of  yj  as  expressed  by  the  equation 

(26) 

5.5.  Competitive  Learning 

(Competitive  learning,  introduced  by  CJross- 
berg  (1970)  and  Malsburg  (1973)  and  exten¬ 
sively  studied  by  Amari  &  Takeuchi  (1978), 
Amari  (1983)  and  Grossberg  (1982)  is  a 
method  of  automaticaUy  creating  classes  for  a 
set  of  input  patterns.  Competitive  learning  is  a 
two-step  procedure  that  couples  the  recall  pro¬ 
cess  with  the  learning  process  in  a  two-layer 
neural  network  (see  Figure  12).  In  Figure  12 
each  Fx  PE  represents  a  component  of  the  input 
pattern  and  each  Fy  PE  represents  a  class  (see 
also  §4.3.4.). 

Step  1:  Determine  winning  Fy  PE.  An  input 
pattern,  A^,  is  passed  through  the  connections 
from  the  input  layer,  Fx,  to  the  output  layer,  Fy, 
in  a  feedforward  fashion  using  the  dot  product 
update  equation 
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Vj  =  Z  (27) 

«•  =  1 

where:  xj  is  the  i’th  PE  in  the  input  layer  Fx,  i 
=  1,2, ...,  n;  yj  is  the  j’th  PE  in  the  output  layer 
Fy,  j  =  1. 2, ....  p;  and  wjj  is  the  value  of  the  con¬ 
nection  weight  between  xj  and  yj.  Each  set  of 
connections  tha!  abut  a  Fy  PE,  say  yj,  as  a  ref¬ 
erence  vector  =  (wy,  W2j, ...,  Wnj)  represent¬ 
ing  the  class  j.  Tne  reference  vector,  Wj,  that  is 
closest  to  the  input,  Aj^,  should  provide  the 
highest  activation  value.  If  the  input  patterns 
Ajj,  k  =  1 , 2, ...,  m,  and  the  reference  vectors  Wj, 
j  =  1, 2, p,  are  normalized  to  Euclidean  unit 
length,  then  the  following  relationship  holds 

XatiWvjsl  (28) 

where  the  more  similar  A^  is  to  Wj,  the  closer 
the  dot  product  is  to  unity  (see  §3.4.1.).  The  dot 
product  values,  yj,  are  used  as  the  inirial  values 
for  winner-take-all  competitive  interaciioris 
(see  §4.3.4.).  The  rc.  Mltof  these  interactions  is 
identical  to  searching  the  Fy  PEs  and  finding 
the  PE  with  the  largest  dot  product  value.  U sing 
the  equation 


j,,=  (‘  “  (29) 


0  otherwise 


The  Fy  PE  with  the  highest  dot  product  value  is 
called  the  winning  PE.  The  reference  vector 
associated  with  the  winning  ?E  is  the  winning 
reference  vector. 

Step  2:  Adjust  winning  Fy  PE’s  connection  val¬ 
ues.  In  competitive  learning  with  winner-take- 
all  dynamics  like  those  described  above,  there 
is  only  one  set  of  connection  weights  adjusted  - 
the  connection  weights  of  the  winning  refer¬ 
ence  vector.  The  equation  that  automatically 
adjusts  the  winning  reference  vector  and  no 
others  is 


+  a(i)yj  (a,,-  -  w,j)  (30) 


where  a(t)  is  a  positive,  monotonically 
decreasing  function  of  time.  The  result  of  this 
operation  is  the  motion  of  the  reference  vector 
toward  the  input  vector.  Over  several  presenta¬ 
tions  of  the  data  vectors  (on  the  order  of  10,000 
or  more),  the  reference  vectors  will  become  the 
centroids  of  data  clusters  (Kohonen,  1986). 

There  have  been  several  variations  of  this 
algorithm  (cf.  Simpson,  1990a),  but  one  of  the 
most  important  is  the  conscience  mechanism 
(DeSieno,  1988).  By  adding  a  conscience  to 
each  Fy  PE  that  only  allows  an  Fy  PE  to 
become  a  winner  if  it  has  won  an  equiprobable 
number  of  times.  The  equiprobable  winning 
constraint  improves  both  the  quality  of  solution 
and  the  learning  time.  Neural  networks  that 
employ  competitive  learning  include  Learning 
Vector  Quantization  (Kohonen,  1984),  Self- 
Organizing  Feature  Maps  (Kohonen,  1984), 
Adaptive  Resonance  Theory  I  (Carpenter  & 
Grossberg,  1987a),  and  Adaptive  Resonance 
Theory  11  (Carpenter  &  Grossberg,  1987b). 

5.6.  Min>Max  Learning 

Min-max  classifier  systems  utilize  a  pair  of 
vectors  for  each  class  (see  §3.4.3.).  For  the 


class  j,  represented  by  the  PE  yj  and  defined  by 
the  abutting  vectors  Vj  (the  min  vector)  and  Wj 
(the  max  vector).  Learning  in  a  min-max  neural 
system  is  done  using  the  equation 

vf  =  rainKi.vJ'')  (31) 

for  the  min  vector  and 

(32) 

for  the  max  vector.  If  the  min  and  max  vectors 
are  constrained  to  lie  between  0  and  1  along 
each  dimension,  it  is  possible  to  think  of  each 
reference  vector  as  a  fuzzy  set  (Simpson, 
1990b).  Within  this  framework,  the  fuzzy  inter¬ 
section  of  two  vectors,  &  Vj,  is  represented 
by  equation  (31)  and  the  fuzzy  union  of  two 
\  ectors,  &  Wj,  is  represented  by  equation 
(32). 

5.7.  Error  Correction  Learning 

Error  correction  learning  adjusts  the  con¬ 
nection  weights  between  PEs  in  proportion  to 
the  difference  between  the  desir^  and  com¬ 
puted  values  of  each  output  layer  PE.  Two  layer 
error  correction  learning  is  able  to  capture  lin¬ 
ear  mappings  between  input  and  output  pat¬ 
terns.  Multi-layer  error  correction  learning  is 
able  to  capture  nonlinear  mappings  between  the 
inputs  and  outputs.  In  the  following  two  sec¬ 
tions,  each  of  these  learning  techniques  will  be 
described. 

5.7.1.  Two-Layer  Error  Correction  Learn¬ 
ing 

Consider  the  two-layer  network  shown  in 
Figure  13.  Assume  that  the  weights,  W,  are  ini¬ 
tialized  to  small  random  values  (see  §4.6.).  The 
input  pattern,  A]^,  is  passed  through  the  connec¬ 
tions  weights,  W,  to  produce  a  set  of  Fy  PE  val¬ 
ues,  Y  =  (yj ,  y2, ...,  yp).  The  difference  between 
the  computed  output  values,  Y,  and  the  desired 
output  pattern  values,  B^,  is  the  error.  Comput¬ 
ing  the  error  for  each  Fy  PE  is  done  using  the 
equation 


Figure  13:  Two-Layer  Network 


(*>k1  I  I  •••  I  *>kp)  =  Bk 


8.  =  (33) 

The  error  is  used  to  adjust  the  connections 
weights  using  the  equation 

where  the  positive  valued  constant  a  is  the 
learning  rate 

The  foundations  for  the  learning  rule  described 
by  equations  (33)  and  (34)  are  solid.  By  realiz¬ 
ing  that  the  best  solution  can  be  attained  when 
all  the  errors  for  a  given  pattern  across  all  the 
output  PEs,  yj,  is  minimized,  the  following  cost 
function  can  be  constructed 

(35) 

>=  1 

When  E  is  zero,  the  mapping  from  input  to  out¬ 
put  is  perfect  for  the  given  pattern.  By  moving 
in  the  opposite  direction  of  the  gradient  of  the 
cost  function  with  respect  to  the  weights,  the 
optimal  solution  can  be  achieved  (assuming 
each  movement  along  the  gradient,  a,  is  suffi¬ 
ciently  small).  Restated  mathematically,  the 
two-layer  error  correction  learning  algorithm  is 
computed  as  follows 
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=  (36) 

Although  the  cost  function  is  only  with  respect 
to  a  single  pattern,  it  has  been  shown  O^idrow 
&  Hoff,  1960)  that  the  motion  in  the  opposite 
direction  of  the  gradient  for  each  pattern,  when 
taken  in  aggregate,  acts  as  a  noisy  gradient 
modon  that  still  achieves  the  proper  end  result 

The  Perceptron  (Rosenblatt,  1962)  and  the 
Adaline  (Widrow  &  Hoff,  1960),  two  of  the 
most  prominent  early  neural  networks, 
employed  error  correction  learning.  In  addi¬ 
tion,  the  Brain-State-in-a-Box  (Anderson,  et 
al.,  1977)  uses  the  two-layer  error  correction 
procedure  described  above  for  one-layer 
autoassociative  encoding. 

5.7.2.  Multi-layer  Error  Correction  Learn¬ 
ing 

A  problem  that  once  plagued  error  correc¬ 
tion  learning  was  its  inability  to  extend  learning 
beyond  a  two-layer  network.  By  remaining  a 
two-layer  learning  rule,  only  linear  mappings 
could  be  acquired.  There  had  been  several 
attempts  to  extend  the  two-layer  error  correc¬ 
tion  learning  algorithm  to  multiple  layers,  but 
the  same  problem  kept  arising;  How  much  error 
is  each  hidden  layer  PE  responsible  for  the  out¬ 
put  layer  PE  error?  Using  the  three-layer  neural 
network  in  Figure  14  to  explain,  the  problem  of 
multi-layer  learning  (in  this  case  th^-layer 
learning)  was  calculating  the  amount  of  error 
each  hidden  layer  PE,  yj,  should  be  credited  for 
an  output  layer  PE’s  error.  This  problem,  called 
the  credit  assignment  problem  (Barto,  1984; 
Minsky,  1961),  was  solved  through  the  realiza¬ 


tion  that  a  continuously  differentiable  threshold 
function  for  the  hidden  layer  PEs  would  allow 
the  chain  rule  of  partial  differentiation  to  be 
used  to  calculate  weight  changes  for  any  weight 
in  the  network.  Using  the  three  layer  network  in 
Figure  14  to  illustrate  the  multi-layer  error  cor¬ 
rection  learning  algorithm,  the  output  error 
across  all  the  Fz  PEs  is  found  using  the  cost 
function 

y=i 

The  output  of  a  Fz  PE,  zj,  is  computed  using  the 
equation 

=  S  Wy  (38) 

i  =  1 

and  each  Fy  (hidden  layer)  PE,  yj,  is  computed 
using  the  equation 

n=  (39) 


Figure  14:  Three-Layer  Network 
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and  the  hidden  layer  PE  threshold  function  is  and  (42),  the  weight  adjustment  equations  are 


m  =  (40) 

l+e  ^ 

Using  the  same  principle  as  described  in  the 
previous  section,  the  weight  adjustments  will 
be  performed  by  moving  along  the  cost  func¬ 
tion  in  the  opposite  direction  of  the  gradient  to 
a  minimum  (where  the  minimum  is  considered 
to  be  the  input-output  mapping  producing  the 
smallest  amount  of  total  error).  TTie  connection 
weights  between  the  Fy  and  F2  PEs  are 
adjusted  using  the  same  form  of  equation 
derived  earlier  for  two-layer  tnot  correction 
learning,  yielding 


_  3 

Bwii 


Bwij 


L  y=i 


=  djv.  (41) 

Next,  the  adjustments  to  the  connection 
weights  between  the  Fx  and  Fy  PEs  are  found 
using  the  chain  rule  of  partial  differentiation, 
yielding 

dE  ^ 

dv^i  dy.dr.dx^dv^i 


p 

=  Z  (42) 

/=! 

The  multi-layer  version  of  this  algorithm  is 
commonly  referred  to  as  the  backpropagation 
of  errors  learning  rule,  or  simply  backpropaga¬ 
tion.  Utilizing  the  chain  rule,  it  is  possible  to 
calculate  weight  changes  for  an  arbitrary  num¬ 
ber  of  layers.  The  number  of  iterations  tiiat 
must  be  performed  fo^  each  pattern  in  the  data 
set  is  large,  making  this  off-line  learning  algo¬ 
rithm  very  slow  to  train.  Using  equation  (41) 


and 


new  old 

^ij  -  ^ij 


new 


old 


(43) 


(44) 


where  a  and  P  are  positive  valued  constants 
that  regulate  the  amount  of  adjustments  made 
with  each  gradient  move. 


Extending  the  backpropagation  to  utilize 
mean-variance  connections  (see  §3.4.2.) 
between  the  Fx  and  Fy  PEs  is  straightforward 
(Robinson,  Niranjan  &  Fallside,  1988).  Figure 
15  shows  the  topology  of  a  three-layer  mean- 
variance  version  of  the  multi-layer  error  correc¬ 
tion  learning  algorithm.  The  hidden  layer,  Fy, 
PE  values  are  computed  with  the  equation 


yi  =  gin); 


(45) 


where  Uhi  represents  the  mean  connection 
strength  ^tween  the  h’th  Fx  and  i’th  Fy  PEs, 
V|^  is  the  variance  connection  strength  between 
tne  h’th  Fx  and  i’th  Fy  PEs,  and  the  threshold 
function  is  the  Gaussian  function 


S(;()  =  e  ^  (46) 

The  output  PE,  F2,  values  are  then  formed  from 
the  linear  combination  of  the  hidden  layer 
Gaussians  using  the  equation 


(47) 

i  =  1 

where  wjj  is  the  connection  strength  between 
the  i’th  Fy  and  j’th  F2  PEs.  Computing  the  gra¬ 
dients  for  each  set  of  weights  yields  the  follow¬ 
ing  set  of  equations 
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dE 

du^i  dzjdy^dr-^u^i 


=  i  (^r 

i=\  V  ^hi  ) 
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j  =  i  \ 
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(49) 

(50) 


Using  these  equations,  the  update  equations  are 
then 


new 

old 

BE 

“a/ 

=  “Ai  - 

3«Ai 

..new 

old 

f,BE 

Vhi 

=  ^hi  - 

new 
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=  ^ij  - 

(51) 

(52) 

(53) 


where  a,  P,  y  are  positive  valued  constants  that 
regulate  the  amount  of  adjustments  made  with 
each  gradient  move. 


The  backpropagation  algorithm  was  intro¬ 
duced  by  Werbos  (1974),  and  later  indepen- 
dendy  rediscovered  by  Parker  (1982)  and 
Rumelhart,  Hinton,  and  >\^illiams  (1986).  The 
algorithm  presented  here  has  been  brief.  There 
are  several  variations  on  the  algorithm  (cf.  Sim¬ 
pson,  1990a)  including:  alternative  multi-layer 
topologies,  methods  of  improving  the  learning 
time,  methods  for  optimizing  the  number  of 
hidden  layers  and  the  number  of  hidden  layer 
PEs  in  each  hidden  layer,  and  many  more. 
Although  there  are  many  issues  that  remain 
unresolved  with  the  backpropagation  of  errors 
learning  procedure,  such  as  proper  number  of 
training  parameters,  the  existence  of  local  min¬ 
ima  during  training,  the  extremely  long  training 
time,  and  the  optimal  number  and  configuration 
of  hidden  layer  PEs,  the  ability  for  this  learning 
method  to  automatically  capture  ncn-  linear 
mappings  remains  a  significant  strength. 

5,8.  Reinforcement  Learning 


The  initial  idea  for  reinforcement  learning 
was  introduced  by  )^drow,  Gupta  &  Maitra 
(1973)  and  has  been  championed  by  >\fi!liams 
(1986).  Reinforcement  learning  is  similar  to 
error  correction  learning  in  that  weights  are 
reinforced  for  properly  performed  actions  and 
punished  for  poorly  poformed  actions.  The  dif¬ 
ference  between  these  two  supervised  learning 
techniques  is  that  error  correction  learning  uti¬ 
lizes  more  specific  error  information  by  collect¬ 
ing  error  values  from  each  output  layer  PE, 
while  reinforcement  learning  uses  non-specific 
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error  information  to  determine  the  performance 
of  the  network.  Where  error-correction  learn¬ 
ing  has  a  whole  vector  of  values  that  it  uses  for 
error  correction,  only  one  value  is  used  to 
describe  the  output  layer’s  performance  during 
reinforcement  learning.  This  form  of  learning  is 
ideal  in  situations  where  specific  error  informa¬ 
tion  is  not  available,  but  overall  performance 
information  is,  such  as  prediction  and  control. 

A  two-layer  neural  network  such  as  the  one 
found  in  Figure  16  serves  as  a  good  framework 
for  the  reinforcement  learning  algorithm.  The 
general  reinforcement  learning  equation  is 

where,  r  is  the  scalar  success/failure  value  pro¬ 
vided  by  the  environment,  6j  is  the  reinforce¬ 
ment  threshold  value  for  the  j’th  Fy  PE,  ey  is 
the  canonical  eligibility  of  the  weight  from  the 
i’th  Fx  PE  to  the  j’th  FY  PE,  and  0  <  a  <  1  is  a 
constant-valued  learning  rate.  In  error  correc¬ 
tion  learning,  gradient  descent  in  error  space 
controlled  learning.  In  reinforcement  learning 
it  is  gradient  descent  in  probability  space.  The 
canonical  eligibility  of  Wjj  is  dependant  on  a 
previously  selected  probability  distribution  that 
is  used  to  determine  if  the  computed  output 
value  equals  the  desired  output  value  and  is 
defined  as 


(55) 


where  gj  is  the  probability  of  the  desired  output 
equalling  the  computed  output,  defined  as 


g.  =  ?riyj  =  btj\Wj,At)  (56) 

which  is  read  as  the  probability  that  yj  equals 
bjg  given  the  input,  A^,  and  the  corresponding 
weight  vector,  Wj. 

Neural  networks  that  employ  reinforcement 
learning  include  the  Adaptive  Heuristic  Critic 
(Barto,  Sutton  &  Anderson,  1983)  and  the 
Associative  Reward-Penalty  neural  network 
(Barto,  1985). 

5.9.  Stochastic  Learning 

Stochastic  learning  uses  random  processes, 
probability,  and  an  energy  relationship  to  adjust 
connection  weights  in  a  multi-layered  neural 
network.  Using  the  three-layer  neural  network 
shown  in  Figure  14  to  illustrate  the  learning 
algorithm,  the  stochastic  learning  procedure  is 
described  as  follows: 

1.  Randomly  change  the  output  value  of  a  hid¬ 
den  layer  PE  (the  hidden  layer  PEs  utilize  a 
binary  step  threshold  function). 

2.  Evaluate  the  change  using  the  resulting  dif¬ 
ference  in  the  neural  network’s  energy  as  a 


guide.  If  the  energy  after  the  change  is  lower, 
keep  the  change.  If  the  change  in  energy  is  not 
lower  after  the  random  change,  accept  die 
change  according  to  a  pre-chosen  probability 
distribution. 

3.  After  several  random  changes,  the  network 
will  eventually  become  “stable.”  Collect  the 
values  of  the  hidden  layer  PEs  and  the  output 
layer  PEs. 

4.  Repeat  steps  1-3  for  each  pattern  pair  in  the 
data  set,  then  use  the  collected  values  to  statis¬ 
tically  adjust  the  weights. 

5.  Repeat  steps  1-4  until  the  network  perfor¬ 
mance  is  adequate. 

The  probabilistic  acceptance  of  higher 
energy  states,  despite  poorer  performance, 
allows  the  neural  network  to  escape  local 
energy  minima  in  favor  of  a  deeper  energy  ntin- 
imum.  This  learning  process,  founded  in  simu¬ 
lated  annealing  (Kirkpatrick,  Gelatt  &  Vecchi, 
1983),  is  governed  by  a  “temperature”  parame¬ 
ter  that  slowly  decreases  the  number  of  proba¬ 
bilistically  accepted  higher  energy  states. 

The  Boltzmann  Machine  (Ackley,  Hinton 
&  Sejnowski,  1985)  was  the  first  neural  net¬ 
work  to  employ  stochastic  learning.  Szu  (1986) 
has  refined  the  procedure  by  employing  the 
Cauchy  distribution  function  in  place  of  the 
Gaussian  distribution  function,  resulting  in  a 
network  that  converges  to  a  solution  much 
quicker. 

5.10.  Hardwired  Systems 

There  are  some  neural  networks  that  have 
their  connection  weights  predetermined  for  a 
specific  problem.  These  weights  are  “hard¬ 
wired”  in  that  they  do  not  change  once  they 
have  been  determined.  The  most  popular  hard¬ 
wired  systems  are  the  neural  optimization  net¬ 
works  (Hopfield  &  Tank,  1985).  Neural 
optimization  works  by  designing  a  cost  func¬ 
tion  that,  when  minimized,  solves  an  uncon¬ 
strained  optimization  problem.  By  translating 
the  energy  function  into  a  set  of  weights  and 


bias  values,  the  neural  network  becomes  a  par¬ 
allel  optimizer.  Given  the  initial  values  of  the 
problem,  the  network  will  run  to  a  stable  solu¬ 
tion.  This  technique  has  been  applied  to  a  wide 
range  of  problems  (cf.  Simpson,  1990a), 
including  scheduling,  routing  and  resource 
optimization  (see  §4.3.3.). 

Two  other  types  of  hardwired  networks 
include  the  Avalanche  Matched  Filter  (Gross- 
berg,  1969;  Hecht-Nielsen,  1990)  and  the  Prob¬ 
abilistic  Neural  Network  (Specht,  1990).  These 
networks  are  considered  hardwired  systems 
because  the  data  patterns  are  normalized  to  unit 
length  and  used  as  connection  weights.  Despite 
the  lack  of  an  adaptive  learning  procedure,  each 
of  these  neural  networks  are  very  powerful  in 
their  own  right. 

5.11.  Summary  of  Learning  Procedures 

There  are  several  attributes  of  each  of  the 
neural  network  learning  algorithms  that  have 
been  described.  Table  1  describes  six  key 
attributes  of  the  learning  procedures  described 
above: 

•  Training  Time  -  How  long  does  it  take  the 
learning  technique  to  r.dequately  capture 
information  (quick,  slow,  very  slow,  and 
extremely  slow)? 

•  On-Line/Off-Line  -  Is  the  learning  tech¬ 
nique  an  on-line  or  an  off-line  learning 
algorithm? 

•  SupervisedAJnsupervised  -  Is  the  learning 
technique  a  supervised  or  unsupervised 
learning  procedure? 

•  Linear/Nonlinear  -  Is  the  learning  tech¬ 
nique  capable  of  capturing  nonlinear  map¬ 
pings? 

•  Structural/Temporal  -  Does  the  learning 
algorithm  capture  structural  information, 
temporal  information,  or  both? 

•  Storage  Capacity  -  Is  the  information  stor¬ 
age  capacity  good  relative  to  the  number  of 
connections  in  the  network? 

The  information  provided  in  Table  1  is  meant 
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as  a  guide  and  is  not  intended  to  be  a  precise 
description  of  the  qualities  of  each  neural  net¬ 
work.  For  a  more  detailed  description  of  each 
neural  network  learning  algorithm,  please  refer 
to  Simpson,  1990a,  Hecht-  Nielsen,  1990,  or 
Marcn,  Harsten  &  Pap,  1990. 


6.  NEURAL  NETWORK  RECALL 

The  previous  section  emphasized  the  stor¬ 
age  of  information  through  a  wide  range  of 
learning  procedures.  In  this  section,  the  empha¬ 
sis  is  retrieving  information  already  stored  in 
the  network.  Some  of  the  recall  equations  have 
been  introduced  as  a  part  of  the  learning  pro¬ 
cess.  Others  will  be  introduced  here  for  the  first 
time.  The  recall  techniques  described  here  fall 
into  two  broad  categories:  feedforward  recall 
and  feedback  recall. 

6.1.  Feedfonvard  Recall 

Feedforward  recall  is  performed  in  net¬ 
works  that  do  not  have  feedback  connections. 
The  most  common  feedforward  recall  tech¬ 
nique  is  the  linear  combiner  (see  §3.4.1.)  fol¬ 
lowed  by  a  threshold  function 

n 

>;■  =  (57) 

i  =  1 

where  the  threshold  function  f  is  one  of  those 
described  in  §3.5. 


For  a  feedforward  network  using  dual  con¬ 
nections  (see  §3.4.2.)  where  one  set  of  connec¬ 
tion  weights,  W,  represents  the  mean  and  the 
other  set  of  connection  weights,  V,  represents 
the  variance,  the  recall  equation  is 


where  g  is  the  Gaussian  threshold  function  (see 
§3.5.5.). 

For  a  feedforward  network  using  dual  con¬ 
nections  where  one  set  of  connection  weights, 
V,  represents  he  min  vector  and  the  other  set  of 
connection  weights,  W,  represents  the  max  vec¬ 
tor  (see  §3.4.3.),  and  the  system  is  confined  to 
the  unit  hypercube,  the  recall  equation  is 

yj  =  ( 1  -  supersethood(X,  Wp) 

X  (1  -  subsethood(X,  vp) 

=  subsethood(X,  Wy) 

X  supersethood(X,  Vp  (59) 

where  the  supersethood  operation  is  defined  as 
supersethood(X,  T)  = 

n 

][)max(0,x,-yj) 


»■=  1 


1  Table  1:  Neural  Network  Learning  Algorithms  I 

Learning  Algorithm 

Training 

Time 

On-Line/ 

Off-Line 

Sigiervised/ 

Unsupervised 

Linear/ 

Nonlinear 

Structural/ 

Temporal 

Storage 

Capacity 

Hebbian  Learning 

Fast 

On-line 

Unsupervised 

Linear 

Structural 

Poor 

Pnnciple  Component  Learning 

Slow 

Off-line 

Unsupervised 

Linear 

Structural 

Good 

Differential  Hebbian  Learning 

Fast 

On-line 

Unsupervised 

Linear 

Temporal 

Undetermined 

Competitive  Learning 

Stow 

On-bne 

Unsupervised 

Linear 

Structural 

Good 

Min-Max  Learning 

Fast 

On-line 

Unsupervised 

Linear 

Stuctural 

Good 

Two- Layer  Error  Correction  Learning 

Slow 

Off-line 

Supervised 

Linear 

Both 

Good 

Multi-Layer  Error  Correction  Learning 

Very  Slow 

Off-line 

Supervised 

Nonlinear 

Both 

Very  Good 

Reinforcement  Learning 

Extremely  Slow 

Off-lme 

Supervised 

Nonlinear 

Both 

Good 

Stochastic  Learning 

Extremely  Slow 

Off-line 

Supervised 

Nonlinear 

Structural 

Very  Good 

Hardwired  Systems 

Fast 

Off-line 

Supervised 

Nonlinear 

Structural 

Good 

Referring  to  Figure  5,  equation  (59)  measures 
the  degree  to  which  the  input  pattern  Ai^  falls 
between  the  min  and  max  vectors  of  class  j, 
where  a  value  of  1  means  that  Ajj  falls  com¬ 
pletely  between  Vj  and  Wj,  and  the  closer  yj  is 
to  0,  the  greater  the  disparity  between  A|j  and 
the  class  j,  with  a  value  of  0  meaning  that  Aj^  is 
completely  outside  of  the  class. 

6.2.  Feedback  Recall 

Those  networks  that  have  feedback  connec¬ 
tions  employ  a  feedback  recall  equation  of  the 
form 

XyCr+l)  =  il-a)xp)  + 

n 

+  (61) 

»•  =  1 

where  Xj(t+1)  is  the  value  of  the  j’th  element  in 
a  single-layer  neural  network  at  time  t+1,  f  is  a 
monotonic  non-decreasing  function  (e.g.  sig¬ 
moid  function),  a  is  a  positive  constant  that 
regulates  the  amount  of  decay  a  PE  value  has 
during  a  unit  interval  of  time,  ^  is  a  positive 
constant  that  regulates  the  amount  of  feedback 
the  other  PEs  provide  the  j’th  PE,  and  ai^  is  the 
constant  valu^  input  from  the  i’th  component 
of  the  k’th  input  pattern. 

One  issue  that  arises  in  feedback  recall  sys¬ 
tems  is  stability.  Stability  is  achieved  when  a 
network’s  PEs  cease  to  change  in  value  after 
they  have  been  given  an  initial  set  of  inputs, 
Ak,  and  have  processed  for  a  while.  If  the  net¬ 
work  did  not  stabilize,  it  would  not  be  of  much 
use.  Ideally,  the  initial  inputs  to  the  feedback 
neural  network  would  represent  the  input  pat¬ 
tern  and  the  stable  state  ^at  the  network 
reached  would  represent  the  nearest  neighbor 
output  of  the  system. 

An  important  theorem  was  presented  by 
Cohen  &  Grossberg  (1983)  that  proved  for  a 
wide  class  of  neural  networks  under  a  set  of 
minimal  constraints,  the  network  would 
become  stable  in  a  finite  period  of  time  given 
any  initial  conditions.  This  theorem  dealt  with 


systems  that  had  weights  that  were  fixed.  In  an 
extension  to  the  Cohen-  Grossberg  Theorem, 
Kosko  (1990)  showed  that  a  neural  network 
could  learn  and  recall  at  the  same  time,  and  yet 
still  remain  stable. 

6  J.  Interpolation  vs.  Nearest-Neighbor 
Responses 

In  addition  to  recall  operations  being  either 
feedforward  or  feedback,  there  is  another 
important  attribute  associated  with  recall:  out¬ 
put  response.  There  are  two  types  of  neural  net- 
woik  output  response:  nearest-neighbor  and 
interpolative.  Figiue  17  illustrates  the  differ¬ 
ence.  Assume  that  the  three  face/disposition 
pairs  shown  in  Figure  17(a)  have  been  stored  in 
a  neural  network.  If  an  input  that  is  a  combina¬ 
tion  of  two  of  the  faces  is  presented  to  the  net¬ 
work,  there  are  two  ways  that  a  neural  network 
might  respond.  If  the  output  is  a  combination  of 
the  two  correct  outputs  associated  with  the 
given  inputs,  then  the  network  has  performed 
an  interpolation  (see  Figure  17(b)).  On  the  con¬ 
trary,  the  network  might  determine  which  of  the 
stored  faces  is  most  closely  associated  with  the 
input  and  respond  with  the  associated  output 
for  that  face  (see  Figure  17(c)).  The  feedfor¬ 
ward  pattern  matching  neural  networks  are  typ¬ 
ically  interpolative  response  networks  (eg. 
Backpropagation  and  Linear  Associative  Mem¬ 
ory).  The  feedforward  pattern  classification 
networks  (eg.  Learning  Vector  Quantization) 
and  the  feedback  pattern  matching  networks 
(eg.  Hopfield  Network  and  Bidirectional  Asso¬ 
ciative  Memory)  are  typically  nearest-  neigh¬ 
bor  response  networks. 

7,  NEURAL  NETWORK  TAXONOMY 

Several  different  topologies,  learning  algo¬ 
rithms,  and  recall  equations  have  been 
described.  Attempts  at  organizing  the  various 
configurations  quickly  becomes  unwieldy 
unless  some  simple,  yet  accurate,  taxonomy 
can  be  applied,  llie  two  most  prevalent  aspects 
of  neural  networks,  learning  supervision  and 
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(a) 


Figure  17:  Interpolative  vs. 
Nearest-Neighbor  Recaii 

Stored  Associations:  FACES  ->  DISPOSITION 


HAPPY  I 

I 

[>  SAD  I 
ANGRY  \ 


(b)  INTERPOLATIVE  RECALL: 
Respond  within  interpolation  of  all  stored  values. 

Happily  Angry 
(Devious) 


(c)  NEAREST-NEIGHBOR  RECALL: 
Respond  with  the  closest  of  all  stored  values. 

C>  Angry 


information  flow,  seem  ideally  suited  to  address 
this  need.  Table  2  utilizes  these  criteria  to  orga¬ 
nize  the  neural  networks  described  above  into  a 
matrix  with  learning  supervision  on  the  ordi¬ 
nate  and  recall  information  flow  on  the 
abscissa. 


8.  COMPARING  NEURAL  NETS  TO 
OTHER  INFORMATION  PROCESSING 
METHODS 

There  are  several  information  processing 
techniques  that  have  capabilities  similar  to  the 
neural  network  learning  algorithms  described 
above.  Despite  the  possibility  of  equally  com¬ 
parable  solutions  to  a  given  problem,  there  are 
several  addition  aspects  of  a  neural  network 
solution  that  are  appealing  including:  fault-  tol¬ 
erance  through  the  large  number  of  connec¬ 
tions,  parallel  implementations  tha:  allow  fast 
processing,  and  on-line  adaptation  that  allows 
the  networks  to  constantly  change  according  to 


the  needs  of  the  environment.  The  following 
sections  briefly  describe  some  of  the  alternative 
methods  that  are  used  for  pattern  recognition, 
clustering,  control,  and  statistical  analysis. 

8.1.  Stochastic  Approximation 

The  method  of  stochastic  approximation 
was  first  introduced  by  Robbins  and  Monro 
(1951)  as  a  method  for  finding  a  mapping 
between  inputs  and  outputs  when  the  inputs  and 
outputs  are  extremely  noisy  (i.e.  the  inputs  and 
outputs  are  stochastic  variables).  The  stochastic 
approximation  technique  has  been  shown  to  be 
identical  to  the  two-layer  error  correction  algo¬ 
rithm  presented  in  §5.7.1.  (Kohonen,  1984)  and 
the  three-layer  error  correction  algorithm  pre¬ 
sented  in  §5.7.2.  (White,  1989). 

8.2.  Kalman  Filters 

A  Kalman  Filter  is  a  technique  for  estimat¬ 
ing,  or  predicting,  the  next  state  of  a  system 
based  upon  a  moving  average  of  measurements 
driven  by  additive  white  noise.  The  Kalman 
Filter  requires  a  model  of  the  relationship 
between  the  inputs  and  the  outputs  to  provide 
feedback  that  allows  the  system  to  continuous 
perform  its  estimation.  Kalman  filters  are  pri¬ 
marily  used  for  control  systems.  Singhal  and 
Wu  (1989)  have  develop^  a  method  of  using  a 
Kalman  filter  to  train  the  weights  of  a  multi¬ 
layer  neural  network.  In  some  recent  work. 
Ruck,  et  al.  (1990)  have  shown  that  the  back- 
propagation  algorithm  is  a  special  case  of  the 
Extended  Kalman  Filter  algorithm  and  have 
provided  several  comparative  examples  of  the 
two  training  algorithms  on  a  variety  of  data 
sets. 

8.3.  Linear  and  Nonlinear  Regression 

Linear  regression  is  a  technique  for  fitting  a 
line  to  a  set  of  data  points  such  that  the  total  dis¬ 
tance  between  the  line  and  the  data  points  is 
minimized.  This  technique,  used  widely  in  sta¬ 
tistics  (Spiegel,  1975),  is  similar  to  the  two- 
layer  error  correction  learning  algorithm 
described  in  §5.7.1. 
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Table  2:  Neural  Network  Taxonomies 

Feedback  RECALL  INFORMATION  FLOW  Feedforward 

1 

1 

• 

H 

& 

i 

9 

HopMd  NMwoila  (Anwi,  1972;  HopM,  1992) 

Ami  1  AflT2  (C«n^  A  Gni^tra,  1967*  i  19e7b) 

Bidiracll^  Atwaaiv*  Mmimy  (Komb,  1968;  SimoMn,  199ac) 
Pijtwipl(Co(npan*nt  Natworis  ((^  1962;  Sangw,  1969) 

Linaar  AmocUIm  MamoiY  (Andation.  1970;  Kohonan,  1972) 

AaaodaiM  Raward-Panaty  (Batlo,  1965) 

Adipllv*  HauMo  Critic  (Barto,  Sutton  A  Andaraon,  1983) 

Oriw-Rahtotoatnant  (KlopI,  1966) 

Lawning  Vadot  Quantizdlon  (Kohonan,  1964) 

Fuzzy  Mivlta  ClaiiMar  /SkiipKn,  IMob) 

Laammam  (StaMuch  A  PM,  1963;  WMahaa,  1960) 

u 

1  Supervised 

B(*in-Sta*H(Hh6<»  (Andtnon,  a  >1.,  1977) 

Nwral  Optmizaion  (HopMd  i  Tank,  1985) 

Bolizmann  Machin*  (AcMay,  Hinton  A  Sajnowiki,  1965) 

Naooognition  (Futumma,  1966) 

AvaiawhaMachad  Fka  (Gnaaiatg,  1969;Hachl-Nia1aan,  1990) 

Spaia  DMbutad  liilanioiy(KaMiv*,  1966) 

Gausaian  PdantW  Function  Nalworii  (La*  A  Ni,  1969) 

Backpropagation  (Waboa,  1974;  Pakw,  1962,  Rumatial,  a  a.,  1966) 

Paroaptron  (Roaanbiati,  1962) 

Prababiialc  Naunl  Nalworit  (Spacht,  1990) 

Cauchy  Madww(Szu,  1966) 

AdalinalWldrowAHatf.1960) 

Nonlinear  regression  is  a  technique  for  fit¬ 
ting  curves  (nonlinear  surfaces)  to  data  points. 
White  (1990)  points  out  that  the  threshold  func¬ 
tion  used  in  many  error  correction  learning 
algorithms  is  a  family  of  curves  and  the  adjust¬ 
ment  of  the  weights  that  minimizes  the  overall 
mean-squared-error  is  equivalent  to  curve  fit¬ 
ting.  In  this  sense,  the  backpropagation  algo¬ 
rithm  described  in  §5.7.2  is  an  example  of  an 
automatic  nonlinear  regression  technique. 

8.4.  Correlation 

Correlation  is  a  method  of  comparing  two 
patterns.  One  pattern  is  the  template  and  the 
other  is  the  input  The  correlation  between  the 
two  patterns  is  the  dot  product.  Correlation  is 
used  extensively  in  patteni  recognition  (Young 
&  Fu,  1986)  and  signal  processing  (Elliot, 
1987).  In  pauem  recognition  the  templates  and 
inputs  are  normalized,  allowing  the  dot  product 
operation  to  provide  similarities  based  upon  the 
angles  between  vectors.  In  signal  processing 
the  correlation  procedure  is  often  used  for  com¬ 
paring  templates  with  a  time-series  to  deter¬ 
mine  when  a  specific  sequence  occurs  (this 
technique  is  commonly  referred  to  as  cross- 
correlation  or  matched  filters).  The  Hebbian 
learning  techniques  described  in  §5.2.  are  cor¬ 
relation  routines  that  store  correlations  in  a 
matrix  and  compare  the  stored  correlations 
with  the  input  pattern  using  inner  products. 

8.5.  Bayesian  Classification 

The  purpose  of  pattern  classification  is  to 
determine  which  class  a  given  pattern  belongs. 


If  the  class  boundaries  are  not  cleanly  separated 
and  tend  to  overlap,  the  classification  system 
must  find  the  boundary  between  the  classes  that 
minimizes  the  average  misclassification 
(error).  The  smallest  possible  error  relative  to  a 
predefined  risk  is  referred  to  as  the  Bayes  error, 
and  a  classifier  that  minimizes  Bayes  error  is 
called  a  Bayesian  classifier  (Fukunuga,  1986). 
The  Parzen  approach  to  implementing  a  Baye¬ 
sian  classifier  utilizes  a  uniform  kernel  (typi¬ 
cally  the  Gaussian  function)  to  approximate  the 
probability  density  function  of  the  data.  A  neu¬ 
ral  network  implementation  of  this  approach 
(see  §4.5.)  is  the  Probabilistic  Neural  Network 
(Specht,  1990). 

8.6.  Vector  Quantization 

The  purpose  of  vector  quantization  is  pro¬ 
duce  a  code  from  an  n-  dimensional  input  pat¬ 
tern.  The  code  is  passed  across  a  channel  and 
then  used  to  reconstruct  the  original  input  with 
a  minimum  amount  of  distortion.  There  have 
been  several  techniques  proposed  to  perform 
vector  quantization  (Gray,  1984),  with  one  of 
the  most  successful  being  the  LBG  algorithm 
(Linde,  Buzo  &  Gray,  1980).  The  Learning 
Vector  Quantization  (see  §5.5.)  is  a  method  of 
developing  a  set  of  reference  vectors  from  a 
data  set  and  is  very  similar  to  the  LBG  algo¬ 
rithm.  A  comparison  of  these  two  techniques 
can  be  found  in  Ahalt,  et  al.  (1990). 

8.7.  Radial  Basis  Functions 

A  radial  basis  function  is  a  function  that  is 
symmetric  about  a  given  mean  (e.g.  a  Gaussian 
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function).  In  pattern  classification  a  radial  basis 
function  is  used  in  conjunction  with  a  set  of  n- 
dimensional  reference  vectors,  where  each  ref¬ 
erence  vector  has  a  radial  basis  function  that 
constrains  its  response.  An  input  pattern  is  pro¬ 
cessed  through  the  basis  functions  to  produce 
an  output  response.  The  mean-variance  con¬ 
nection  topologies  that  employ  the  backpropa- 
gation  algorithm  (Lee  &  Kil,  1989;  Robinson, 
Niranjan,  &  Fallside,  1988)  as  described  in 
§5.7.2.  are  methods  of  automatically  producing 
the  proper  sets  of  basis  functions  (by  adjust¬ 
ment  of  the  variances)  and  their  placement  (by 
adjustment  of  their  means). 

8.8.  Machine  Learning 

Neural  networks  are  not  the  only  method  of 
learning  that  has  been  proposed  for  machines 
(although  it  is  tl.e  most  biologically  related). 
There  are  a  large  number  of  machine  learning 
procedures  that  have  been  proposed  over  the 
course  of  the  past  thirty  years.  Carbonell  (1990) 
classifies  machine  learning  into  four  major  par¬ 
adigms  (pg.  2):  “[Ijnductive  learning  (e.g., 
acquiring  concepts  from  sets  of  positive  and 
negative  examples),  analytic  learning  (e.g., 
explanation-based  learning  and  certain  forms 
of  analogical  and  case-based  learning  meth¬ 
ods),  genetic  algorithms  (e.g.,  classifier  sys¬ 
tems),  and  connectionist  learning  methods 
(e.g.,  nonrecurrent  “backprop”  hidden  layer 
neural  networks).”  It  is  possible  that  some  of 
the  near-term  applications  might  find  it  useful 
to  combine  two  or  more  of  these  machine  learn¬ 
ing  techniques  into  a  coherent  solution.  It  has 
only  been  recently  that  this  type  of  approach 
has  even  been  considered. 
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ABSTRACT 


An  artificial  neural  network  (ANN)  is  a  software  implementation  of  a  neural  paradigm, 
and,  therefore,  such  projects  yield  to  many  of  the  disciplines  of  softwaie  engineering 
On  the  other  hand,  many  issues  that  must  be  faced,  as  the  project  proceeds,  are  unique 
and  require  specialized  knowledge  to  address. 

This  paper  is  concerned  mainly  with  the  management  of  such  projects,  however  in  order 
to  propose  the  management  issues,  it  seems  necessary  to  understand,  at  least  superficially, 
the  process  of  the  design  and  implementation  of  a  neural-based  system.  This  paper 
therefore  begins  with  a  proposal  for  a  methodology  for  the  conduct  of  a  project 
involving  the  choice,  design,  and  implementation  of  a  neural-based  system.  It  outlines  the 
issues  that  should  be  considered  and  resolved  at  each  step  of  the  project. 

Based  on  this  methodology,  a  project  management  plan  can  be  put  in  place.  Such  a  plan 
calls  for  a  set  of  milestones  and  design  reviews  for  various  levels  of  management  (and  the 
customer)  and  a  corresponding  document  set  designed  to  prove  a  milestone  has  been 
reached,  and,  finally,  that  the  original  requirements  have  been  met. 


1.0  INTRODUCTION 


This  paper  brings  together  past  experience  in  the  development  of  software  systems,  including 
expert  systems  and  neural  nets,  in  an  attempt  to  formulate  a  system  design  methodology  for  neural 
net  projects.  This  is  an  important  requirement  for  both  the  customer  and  the  developer  if  such 
projects  are  to  become  a  professional  activity  and  commercially  feasible. 

As  with  the  early  days  of  expert  system  projects  there  seems  to  be  a  host  of  issues  unique  to  neural 
computing  which  would  suggest  that  the  rules  of  good  project  design  and  management  can  be 
ignored.  It  is  the  thesis  here  that  these  rules  cannot  be  ignored  and  that  there  is  little  excuse  for 
'hacking'  towards  a  solution.  There  are,  in  fact,  critical  issues  to  be  resolved  and  there  are 
appropriate  times  to  face  these  issues,  and  there  is  also  a  minimum  level  of  knowledge  and 
experience  necessary  to  resolve  them  and  proceed.  It  is  towards  the  structuring  of  these  issues  and 
the  evolution  of  a  design  methodology  for  facing  the  issues  when  required,  and  for  providing  a 
mechanism  for  providing  evidence  that  they  have  been  faced,  that  this  work  is  dedicated.  When  all 
of  these  things  are  understood,  it  is  then  possible  to  develop  a  methodology  for  such  projects,  and 
from  this  a  project  management  approach. 

The  present  work  has  been  strongly  influenced  by  a  general  systems  design  methodology 
established  by  the  author  [A-1],  by  extensive  experience  in  doing  battle  with  real  neural  network 
applications,  and  by  later  worlq  specific  to  neural  computing,  by  Robert  Hecht-Nielson  [A-2]  and 
by  Bailey  &  Thompson  [A-3].  It  is  also  influenced  by  the  procedures  and  reporting  mechanisms 
defined  in  the  United  States  Department  of  Defense  Military  Standards  (MIL  STDS)  2167A  and 
490. 

In  Section  2  an  overview  of  the  methodology  is  given,  followed  by  four  sections  devoted  to  an 
examination  of  the  procedures  that  should  be  executed  and  issues  that  should  be  faced  at  each  step. 
In  Section  7,  a  plan  for  project  management  is  proposed.  This  plan  shares  many  features  with  any 
plan  to  manage  the  production  of  software  systems.  TTie  details  are  based  on  the  structure  and  the 
resulting  milestones  of  the  proposed  methodology. 
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2.0  THE  METHODOLOGY  -  AN  OVERVIEW 


The  creation  of  an  artificial  neural  network  is  essentially  a  software  project  with  a  special  set  of 
rules  and  issues  that,  must  be  observed  and  addressed  by  the  design  team.  It  is  important  to 
distinguish  here  between  a  project  in  which  no  specific  goals  or  deliverables  are  expected,  such  as  a 
familiarization  exercise,  and  a  project  with  a  defined  level  of  effort,  deliverables  and  a  budget. 
We  are  specifically  concerned  with  the  later. 

Neural  engineering  is  not  as  advanced  as  other  aspects  of  software  engineering,  however,  it  would 
be  folly  to  believe  that  creating  and  managing  a  project  in  neural  engineering  is  different  from  any 
other  software  undertaking.  A  methodology  is  therefore  required,  and  proper  project 
management  and  control  essential  to  the  successful  completion  of  anything  but  a  toy  project.  The 
methodology  is  strongly  influenced  by  the  concepts  developed  in  the  DND  standards  for  software 
systems  development.  That  system  produces  three  specifications;  called  the  Level  A,  B,  and  C 
Specification.  The  Level  A  specification  (the  A  Spec.)  describes  the  end-user  problem  to  be  solved 
and  provides  the  functionality  and  performance  and  that  must  be  achieved  and  the  constraints 
which  must  be  satisfied.  The  Level  B  Specification  (the  B  spec.)  is  prepared  by  the  design  team 
and  describes  the  top-level  design  of  the  system  to  be  built.  This  specification  is  usually  reviewed 
at  a  Preliminary  Design  Review  (the  PDR).  The  Level  C  specification  is  a  detailed  design 
document  which  described  the  system  to  be  built.  This  document  is  reviewed  at  a  Critical  Design 
Review  (CDR). 

In  addition  to  design  reviews  attended  by  the  customer,  there  are  lower  level  design  reviews 
conducted  by  the  design  team  usually  conducted  on  a  regular  basis.  These  internal  reviews  keep 
the  project  on  track  and  are  invaluable  preparation  for  the  more  public  reviews.  A  good 
methodology  contains  an  intrinsic  modularity  at  which  the  state  of  a  project  can  be  assessed, 
reviewed  and  corrective  action  taken  to  ensure  and  maintain  convergence  to  the  original 
requirements,  if  necessary. 

The  methodology  proposed  here  has  four  major  phases: 

Requirements  Analysis 
Logical  Design 
Implementation 
Integration  and  Maintenance 

Each  of  these  phases  mark  a  major  mile.stone  at  which  the  project  can  be  evaluated  and  decisions 
made  as  to  progress  and  continuation. 

The  requirements  analysis  is  sometimes  referred  to  as  'functional  specification  development' 
and  results  in  the  definition  of  the  equivalent  of  the  Level  A  Specification.  This  phase  provides  the 
interface  from  the  original  problem  to  the  functional  specifications.  During  this  phase,  the 
desired  functionality  and  performance  of  the  final  system  is  specified.  In  addition,  the  user  and 
system  interfaces  of  the  final  system  should  be  outlined.  As  part  of  this  analysis,  it  is  important  to 
define  the  constraints  (functional,  economic  and  otherwise)  that  the  design  team  must  consider 
during  the  design  phase.  In  general,  the  approach  is  to  consider  the  system  to  be  built  and  specify 
how  is  should  appear  to  the  user  and,  if  appropriate,  how  it  will  interface  to  other  portions  of  a 
total  system.  An  important  aspect  of  this  phase  is  to  determine  the  available  data  sets  and  how  the 
final  system  will  be  evaluated  for  acceptance. 

The  logical  design  phase  involves  selecting  the  appropriate  set  of  neural  paradigms,  designing 
the  network  and  finally  the  training  regime.  Each  of  these  sub-phases  constitute  an  ideal  point  for 
an  intermediate  design  review  and  project  milestone  meeting.  Part  way  through  this  phase,  a 
Preliminary  Design  review  (PDR)  should  produce  the  equivalent  of  the  Level  B  Specification. 
This  is  normally  reviewed  by  the  customer  in  terms  of  the  original  requirements  laid  out  in  the  A 
specification.  The  conclusion  of  this  phase  should  provide  the  implementation  team  with  the 
equivalent  of  the  C  Specification  -  a  clear  set  of  specifications  for  implementation. 

The  implementation  phase  is  when  the  neural  system  is  created,  trained  and  tested.  An 
important  part  of  this  phase  is  the  choice  of  the  implementation  platform,  the  detailed  training  and 
the  testing  (and  often  the  debugging)  of  the  network.  This  pha.se  demands  the  most  'on  the  bench' 


experience,  since  the  gulf  between  the'ory  and  practice  is,  in  some  aspects  of  neural  systems 
engineering,  very  wide  indeed.  The  result  of  this  phase  is  the  product,  ready  for  integration  and 
delivery. 

The  final  system  must  be  delivered  integrated  and  maintained  over  its  life  time.  This  phase 
reinforces  the  need  for  an  agreed  upon  acceptance  plan  and  a  document  set  that  will  permit 
maintenance.  Experience  will  confirm  that  these  details  should  be  considered  at  the  beginning, 
rather  than  at  the  end  of  the  project. 

The  management  of  any  project  is  intrinsically  bound  to  the  methodology  being  followed  by  the 
design  team.  A  good  methodology  has  many  attributes  which  simplify  the  overhead  associated 
with  its  management.  Of  great  importance  is  its  modularity,  which  yields  milestones  at  which 
progress  can  be  measured  and  control  exerted.  Other  factors  are  peculiar  to  the  particular 
software  paradigm,  and  these  will  be  compared  and  explored  in  the  Hnal  section. 


3.0  REQUIREMENTS  ANALYSIS 


3.1  Introduction 

Requirements  analysis  forms  the  first  step  in  any  project,  however,  it  is  often  overlooked  in 
projects  designed  to  exploit  new  technologies,  and  often  a  statement  of  need  is  mistaken  for  a 
deflnidon  of  requirements. 

The  detailed  mechanisms  and  the  documentation  of  the  requirements  will  depend  on  the  formality 
of  the  project  organization,  however,  even  for  in-house  projects,  time  spent  on  requirements  will 
benefit  the  project  by  finding  a  common  ground  for  the  project  team. 

It  is  suggested  here  that  the  following  form  the  minimum  considerations  that  should  precede  a 
neural  net  project:  bound  the  problem,  bound  the  project,  define  the  acceptance  tests,  and  finally 
define  the  tottd  deliverables  package.  These  ideas  are  not  profoundly  different  from  any  other 
project,  and  will  bring  a  focus  to  the  project  which  will  prove  to  be  invaluable.  In  a  global  sense 
the  issues  are  "What  are  we  trying  to  build?',  'Under  what  set  of  constraints  are  we  trying  to  build 
it?'  and  'What  demonstration  will  we  require  to  prove  it  has  been  built?' 

3.2  Bound  The  Problem  •  What  are  we  Trying  to  Build? 

A  necessary  preamble  to  the  final  definition  of  requirements  is  to  obtain  from  the  user(s)  a  clear 
idea  of  what  it  is  that  will  satisfy  their  needs..  This  is  often  difficult  because  the  operational 
language  of  the  users  may  not  be  at  a  sufficiently  technical  level  to  be  easily  translated  into 
comprehensible  technical  jargon.  Never-the-less,  unless  this  step  is  clarified  in  some  detail,  almost 
always  what  is  produced  will  not  be  what  was  expected.  This  situation  does  not  apply  only  to 
neural  networks,  as  your  experience  will  confirm. 

3.2.1  Statement  of  Required  Functionality 

The  need  here  is  for  a  concise  statement  of  what  functions  the  end-product  will  execute.  This  is 
often  stated  in  the  user's  vocabulary,  and  must  be  eventually  translated  into  technical  jargon  by  the 
requirements  analysis  team.  The  user  should  be  encouraged  to  define  'what'  is  required  in  as  few 
words  as  possible.  This  discipline  tends  to  focus  the  need  and  removes  concepts  of  'how'  it  should 
be  done  from  the  discussion. 

3.2.2  Solution  Requirements 

In  this  section  a  more  detailed  statement  of  the  solution  is  given.  This  should  include  the  type  of 
solution,  the  accuracy  acceptable  to  the  output,  and  the  time  constraints,  if  any,  that  are  necessary. 

3.2.3  Data  Sources 

The  data  definitions  should  include  the  data  available  for  training  and  testing  the  system  as  well  as 
the  data  input  to  the  final  system  if  the  format  is  different. 
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3.2.4.  Deflne  the  Interfaces 

System  interfaces  include  all  hardware  and  software  interfaces,  including  the  requirements  of 
operation  with  specific  operating  systems  and  existing  software. 

Often  overlooked  is  the  need  to  specify  high  and  low  level  protocols  for  control  synchronization 
nd  the  format  and  structure  of  data  passed  into  and  out  of  the  surrounding  system. 

This  machine-human  interface  is  often  the  most  crucial  in  the  user's  final  acceptance  of  the  system. 
It  is  also  the  most  difficult  to  specify  in  its  entirety.  In  the  end,  every  screen  interface  and  control 
protocols  for  mode  changing,  screen  manipulation  and  data  passing  must  be  specified. 

3.3  Bound  the  Project 

The  project  is  bounded  by  specifying  the  total  budget  which  fixes  the  level  of  effort.  In  addition, 
however,  time  is  an  often  overlooked  constraint.  There  are  two  aspects  to  timing  constraints: 
project  time  and  performance  time.  If  a  solution  has  to  be  available  in  a  certain  time  frame,  this 
imposes  constraints  which  should  be  understood  at  the  beginning.  If  the  neural  network  must  fit 
into  a  larger  system  response  time  may  form  a  constraint.  This  will  drive  a  host  of  considerations 
Irom  the  neural  topology  to  the  execution  platform. 

3.4  Define  Acceptance  Tests 

Neural  networks  are  trained  to  respond  to  a  set  of  data  elements  which  are  alleged  to  define  the 
input  space.  Because  of  the  data  dependence  of  the  success  of  a  project,  it  is  of  critical  importance 
that  the  final  set  of  tests  that  are  formulated  to  determine  success  or  failure,  and  hence  acceptance 
of  the  final  product,  be  specified  in  detail.  From  tl.e  contractor's  point  of  view  a  test  set  which  is 
not  representative  of  the  training  set  can  spell  disaster. 

All  projects  start  with  the  accumulation  of  a  data  set  which  must  be  representative  of  the  problem 
and  must  eventually  be  used  for  training  and  testing.  The  problem  is  to  guarantee  that  the  training 
and  the  test  set  can  be  considered  representative. 

3.5  Define  the  Deliverables 

3.5.1  Documentation 

In  order  to  maintain  the  neural  net  a  complete  set  of  documentation  is  required.  A  description  of  a 
neural  net  consists  of  a  description  of  the  paradigm,  and  the  implementation  topology.  In  addition 
the  training  and  test  set  should  be  documented.  It  is  most  useful  when  retraining  the  system  to 
have  a  knowledge  of-  the  training  parameters  and  the  details  of  the  training  regime.  Finally  any 
interface  software,  and  restrictions  on  the  execution  platform. 

3.5.2  Code 

The  most  notable  difference  between  the  documentation  of  a  neural  paradigm  and  classical 
software  is  the  'black  box'  character  of  neural  nets.  The  concept  of  documented  code  is  not 
applicable  since  the  character  of  the  neural  net  response  is  buried  in  the  topology  and  the  weights 
of  the  neurons.  Furthermore,  as  discussed  elsewhere,  maintenance  and  extensions  of  the  neural  net 
is  different  in  concept  than  classical  software.  The  neural  net  is  often  thought  of  as  a  black  box. 

The  documentation  must  be  sufficient  to  permit  the  reconstruction  of  the  neural  net  topology  and 
the  weights  of  each  neuron.  This  can  take  the  form  of  a  description  of  the  paradigm  and  the 
topology  and  a  printed  list  of  the  weights  (despite  its  length  in  some  cases)  With  this  data  the 
network  can  be  reconstructed,  and  retraining  because  of  minor  input  sets  undergoing  change  can 
often  be  shortened  by  beginning  with  the  trained  set. 
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4.0  LOGICAL  DESIGN 


4.1  Introduction 

The  logical  design  phase  begins  the  process  of  translating  the  requirements  into  a  proposal  for 
implementation.  In  this  phase  all  the  capabilities  of  neural  computing  paradigms  should  be 
examined  to  determine  the  best  approaches  to  satisfying  the  requirements. 

This  phase  is  often  a  preliminary  step  in  a  bid/no-bid  situation.  Inappropriate  requirements 
formulated  by  a  potential  customer  can  lead  to  a  no-win  situation  if  a  neural  paradigm  is 
demanded,  accepted  for  design  and  delivery,  and  is  intrinsically  inappropriate. 

There  are  no  guarantees  in  this  field,  however,  a  few  preliminary  considerations  will  enhance  the 
probability  of  the  right  choice. 

4.2  Confirm  the  Application 

The  logical  design  team  should  begin  the  design  process  by  reconfirming  that  a  neural  computing 
paradigm  is  suitable  for  the  problem.  In  general  if  an  expert  system  solution  will  satisfy  the 
requirements  then  it  should  be  chosen  before  a  neural  solution,  and  by  extension  if  a  classical 
software  algorithm  will  fulfill  the  needs,  it  should  be  chosen.  The  design  team  should  look  at  not 
only  the  functionality  of  neural  computing  but  at  the  availability  of  data  and  the  impact  of  the  other 
requirements. 

4.2.1  Characteristics  of  Successful  Applications 
Successful  neural  applications  have  the  following  characteristics; 

1.  The  algorithm  to  solve  the  problem  is  unknown  or  expensive  to  discover., 

2.  Heuristics  or  rules  to  solve  the  problem  are  unknown  or  perhaps  difficult  to  enunciate, 

3.  The  application  is  data  intensive  and  a  variety  of  data  sets  are  available  which  can  be 
identified  as  correct  or  describes  specific  examples. 


Several  classes  of  problem  have  these  characteristics  at  this  time:.  Pattern  recognition,  pattern 
completion  or  pattern  classification.  Statistical  mapping. 

Of  these  classes,  applications  include:  Character  Recognition,  Image  classification.  Forecasting, 
Incident  Detection,  Signature  Identification,  robot  control,  signal  processing. 

In  general,  it  should  be  determined  that: 

1 .  Conventional  computer  technology  is  unsuitable  or  inadequate. 

2.  The  application  requires  qualitative  or  complex  quantitative  reasoning. 

3.  The  solution  is  derived  from  inter-dependent  or  correlated  factors  which  are  difficult  or 
impossible  to  quantify. 

4.  Data  is  available  and  corresponding  known  solutions  can  be  derived. 

4.2.2  Characteristics  of  Poor  Applications 

Poor  applications  include; 

In  general,  those  - 

1.  For  which  algorithms  or  rule-based  solutions  are  possible. 

2.  That  require  deduction  and  a  logical  approach  are  not  suitable. 
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3.  That  require  explanations  of  procedures. 

4.  That  are  essentially  mathematical  computations  or  transformations. 


In  particular,  those  - 

1.  Requiring  precise  mathematical  computations. 

2.  In  which  answers  must  be  explained  or  the  steps  documented. 

3.,  An  adequate  an  representative  data  set  is  not  available  for  training  and  testing. 

4.2.3  Choosing  a  Software  Paradigm 

If  the  final  system  is  to  operate  embedded  in  the  original  development  system,  the  issue  of  a 
software  paradigm  is  relatively  unimportant.  In  situations  in  which  the  system  must  interface  to  a 
variety  of  data  bases,  graphics  displays,  and  surrounding  software,  the  representation  of  the  whole 
system  and  indeed  the  underlying  language  may  become  an  important  issue  to  resolve.,  Obviously 
the  issue  is  the  induced  overhead  in  creating  the  software  interfaces  to  link  the  system,  and  the 
potential  consequences  on  performance. 

4.3  Select  the  Neural  Paradigm 

This  step  involves  selecting  a  potential  set  of  neural  paradigms  which  match  the  application 
requirements.  The  issues  here  are  the  size,  training  and  time  constraints,  output  type.  Table  1 
contains  a  comparison  of  the  capabilities  of  a  variety  of  neural  paradigms,  which  could  be  updated 
as  newer  technologies  become  proven.  The  designer  should  choose  a  potential  set  of  paradigms 
which  match  the  requirements,  and  prioritize  the  most  likely  candidates.  In  a  constrained 
environment  (time  and  money)  the  highest  priority  candidate  is  is  started  first.  However,  the  other 
candidates  may  have  to  be  called  upon  if  unforeseen  events  prevent  training  convergence  or 
performance  is  not  as  expected. 

4.3.1  Network  Paradigm 

The  network  choices  include,  the  number  of  layers  or  slabs,  the  number  and  type  of  nodes,  the  size 
of  the  hidden  layers,  the  number  and  type  of  output  nodes,  and  the  connectivity  of  each  neuron 
and  layer. 

4.3.2  Output  Type 

Choosing  the  Size  of  the  Output  Layer:  Choosing  the  number  of  output  neurons  depends  on 
the  paradigm  being  used  and  on  the  type  of  output  being  generated.  There  are  two  broad 
categories  of  outputs:  hetro-  and  auto-associative.  Auto-associative  networks  have  the  same 
number  of  outputs  as  inputs,  whereas  hetro-associative  generally  implies  less.  These  categories  are 
far  too  broad,  and  a  further  division  into  the  various  expected  outputs  is  useful.  These  depend  on 
the  application  and  can  be  categorized  as:  Classification,  Images  or  Patterns,  Optimizations,  and 
Numbers. 

Classification:  The  outputs  are  interpreted  as  categories  or  attributes.  The  output  is  either  a 
binary  vector  or  a  real  number..  Generally  classification  is  indicated  by  a  binary  vector  of  all 
zeros  expect  the  class  of  the  input  data  which  is  a  one.  In  some  cases,  real  numbers  are  used  to 
indicate  further  information  as,  for  example,  the  confidence  of  the  classification. 

Images  or  Patterns:  In  this  application  the  outputs  are  interpreted  as  an  image  or  pattern 
generated  in  response  to  the  input.  The  number  obviously  depends  on  the  detail  of  the  expected 
patterns.  ^ 

Optimization:  The  output  size  depends  on  the  optimization  problem  and  the  information 
required  to  interpret  the  results  of  the  class  of  input  data  being  optimized. 

Numbers:  Numbers  are  a  subset  of  the  other  categories,  however,  in  general  numbers  are  used 
when  the  output  represents  a  number,  such  as  power  levels  or  switch  settings,  etc. 
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4.3.3  Training  Method 

Neural  training  falls  into  three  categories:  supervised,  unsupervised  and  reinforced.  The  choice 
depends  on  many  factors  but  is  strongly  influenced  by  the  availability  of  data.  Supervised  learning 
requires  pairs  of  data  vectors  consisting  of  the  input  pattern  and  the  correct  output.  The  training 
data  must  therefore  contain  the  solution  the  network  is  expected  to  provide.  Generally  this 
training  mode  demands  extensive  data  sets,  and  can  consume  a  long  time  to  achieve  the  correct 
responses. 

Unsupervised  learning  classifies  input  data  patterns  according  to  some  form  of  nearness  criteria. 
The  classification  will  depend  on  the  structure  of  the  training  data,  and  the  training  time  is  usually 
much  shorter  than  unsupervised  learning. 

Reinforced  learning  is  a  compromise  between  the  two.  It  requires  only  the  input  data  and  and 
indications  of  the  goodness  of  the  response  (a  reward  signal).  Reinforced  learning  can  consume 
much  longer  times  than  the  other  two,  but  training  data  requirements  are  less  stringent,  although 
the  goodness  criteria  must  be  attached  to  each  response. 

4.3.4  Time  Constraints 

Ail  aspects  of  the  training  and  operation  of  neural  networks  are  computationally  intensive,  since 
every  neuron  performs  a  sum-of-products  calculation  often  utilizing  floating  point  operations. 
Training  time  is  usually  not  counted  as  part  of  the  operational  timing  constraints,  however,  from  a 
project  point  of  view,  training  times  can  be  very  large  on  an  inadequate  platform.  This  time  is 
pure  delay,  which  tends  to  limit  the  iterations  that  can  be  tried  in  a  fixed  time-frame,  and  can  of 
course  finally  influence  the  delivery  time  table. 

If  the  network  must  fit  into  a  hybrid  software  system  then  the  operational  response  should  be 
specified.  If  it  is  part  of  a  diagnosdc  or  prediction  system,  the  response  may  be  not  critical.  In 
any  event,  response  should  be  considered  as  a  constraint  which  can  influence  the  size  of  the 
network  and  eventually  influences  the  cost  of  the  execution  platform. 

4.4  Network  Design 

The  design  of  die  network  involves  three  basic  issues:  the  node,  the  network  topology,  and  the 
training  details. 

4.4.1  Node  Level 

The  node  or  neuron  design  is  constrained  by  the  type  of  input  to  be  used,  the  transfer  function  and 
the  nearness  computation,  'fhe  input  data  format  has  already  been  specified  and  is  usually 
unalterable..  The  nearness  function  is  usually  an  inner  product  (a  sum  of  products)  however  others 
can  be  used  such  as  a  vector  difference.  The  transfer  function  is  the  nonlinearity  following  the 
nearness  computation.  This  can  be  linear,  signum,  sigmoid,  and  hyperbolic  tangent.  The  selection 
is  determined  by  the  characteristics  of  the  region  boundaries  and  in  the  case  of  backpropagation 
training  by  the  necessity  of  a  differentiable  function.  The  calculation  of  the  nonlinearity  afiects  the 
computational  complexity  of  each  neuron  and  the  simplest  possible  should  be  chosen. 

4.4.2  Network  Level 

At  the  network  level  of  design,  the  topology  of  the  interconnection  of  the  neurons  must  be  decided. 
This  involves  the  number  of  layers  or  slabs  within  a  layer,  the  number  and  type  of  nodes,  the  size 
of  the  hidden  layers,  the  number  and  type  of  output  nodes,  and  finally  the  detailed 
interconnectivity  of  all  the  neurons.  Several  paradigms  have  a  fixed  topology  in  that  the  number 
of  layers  is  predefined,  e.g.,  Hopfield  nets,  Kohonen  self-organization  maps,  etc. 

In  backpropagation  nets,  hidden  layers  act  as  levels  of  abstraction.  Adding  hidden  layers  will 
increase  the  ability  to  abstract  characteristics  of  the  input  classes,  however,  training  will  take 
longer  and  in  the  end  the  training  of  multilayer  networks  by  backpropagation  become  very  tedious 
and  convergence  is  not  necessarily  guaranteed  in  practice..  The  number  of  neurons  in  a  hidden 
layer  affects  the  ability  to  generalize  the  characteristics  of  the  input  data.  Generalization  and 
memorization  becomes  a  critical  issues  in  selection  the  number  of  neurons  in  the  hidden  layer. 
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The  two  opposing  schemes  for  back  propagation  topology  achieve  memorization  of  the  input 
training  set  or  achieve  generalization  of  the  features  of  the  training  set  to  identify  examples  never 
before  seen.  In  general,  increasing  the  number  of  neurons  in  the  hidden  layer  offers  sufficient 
memory  for  the  network  to  memorize  the  test  set.  Conversely  reducing  the  number  of  neurons,  up 
to  a  point,  forces  generalization. 

Determining  the  exact  number  of  neurons  to  achieve  generalization  is  not  a  solved  problem,  and  is 
often  achieved  by  experimentation.  The  difficulty  lies  in  the  affect  on  the  test  set.  A  network 
trained  to  memorize  will  achieve  very  good  results  if  the  test  set  is  equivalent  to  the  training  set; 
and  conversely  the  performance  can  be  very  poor  if  the  test  set  includes  new  examples  outside  the 
training  set. 

4.4.3  Training  Issues 

The  issues  to  be  addressed  before  training  begins  are  both  strategic  and  tactical.  Strategically  the 
training  falls  into  three  phases  (as  in  chess)  with  a  beginning,  a  middle  and  an  end  game.  In  each 
of  these  phases,  training  parameters  can  be  varied  to  hasten  or  encourage  convergence.  The  plan 
should  outline  die  training  parameters  for  each  phases  and  some  measurements  which  will  suggest 
when  each  phase  has  been  completed.  On  the  other  hand,  measurements  should  be  determined  to 
decide  when  training  is  not  converging,  and  the  time  has  come  to  back-up  and  try  a  new  set  of 
parameters.  In  practice  it  is  often  difficult  to  predetermine  these  measurement  exactly  and 
sometimes  a  certain  amount  of  synergy  is  necessary  to  observe  lack  of  progress  and  to  suggest 
corrective  actions.  The  need  for  a  theoretical  background,  experience,  and  good  judgement  in 
combining  the  theory  and  experience  become  evident  during  this  phase. 

These  issues  will  be  discussed  in  more  detail  in  Section  5.4. 


5.0  IMPLEMENTATION 


5.1  Introduction 

The  implementation  phase  is  the  crucial  phase  in  the  development  of  a  neural  project.  Despite  all 
the  preparation,  it  is  not  always  possible  to  guarantee  convergence  of  the  training,  however, 
following  a  well  established  me.liodology  [B-1]  will  enhance  the  probability. 

The  key  activities  are:  Characterize  and  Prepaie  the  Input  Data  Set,  Choose  the  Development 
System,  Train  the  Network,  and  be  prepared  to  Debug  and  Test  the  Network. 

5.2  Characterize  the  Input  Data  Set 

5.2.1  Assemble  and  Prepare  the  Input  Data  Set 

This  phase  consists  of  two  major  activities;  assembling  the  data  set  and  preparing  it  for  training 
and  eventual  testing  of  the  network. 

The  input  data  set  refers  to  all  the  data  that  will  be  used  both  for  training  and  testing  the  network. 
Initially  the  concern  is  with  the  quality  of  the  data.  Under  some  circumstance  the  data  can  be 
ambiguous,  error  ridden,  come  from  multiple  sources  and  formats,  and  in  some  cases  have 
conflicting  judgements  on  its  classification. 

Preparing  the  data  refers  to  two  major  activities:  accommodating  the  input  formats  of  the 
development  environment,  and  preproce.ssing  the  data  to  enhance  its  training  potential. 
Accommodating  the  input  formats  suggests  the  potential  need  for  code  conversions,  and 
normalization  scaling.  Preprocessing,  sich  as  creating  ratios  or  some  form  of  filtering,  is 
sometimes  useful  in  enhancing  training  or  thj  meaningfulness  of  the  results.  Obviously  all  training 
and  test  data  sets  must  be  brought  to  the  same  format  before  being  used. 


Of  critical  importance  during  this  phase  is  the  definition  of  the  acceptance  test  set.  This  is  the  final 
data  input  which  will  define  if  the  system  is  preforming  with  sufficient  accuracy  to  be  useful  to  the 
end-user  (and  it  may  determine  if  the  final  invoice  is  accepted). 

Typically  the  fnal  test  set  is  not  made  available  to  the  development  team.  Since  many  neural  nets 
can  be  made  to  memorize  a  given  set  of  input  data,  it  is  clear  that  the  acceptance  test  set  should  be  a 
set  which  at  least  includes  samples  that  have  not  been  used  in  training. 

On  the  other  hand  the  development  team  needs  a  test  set  to  be  used  to  evaluate  the  effectiveness  of 
the  training  regime.  A  large  set  of  data  covering  all  interesting  cases  should  be  made  available  to 
the  development  team  from  which  they  can  choose  the  optimal  training  and  test  sets.  The  goal  is  to 
force  the  neural  net  to  generalize  the  characteristics  of  the  input  data  classes  based  on  the  test  set  so 
that  appropriate  responses  to  the  test  sets  will  be  derived. 

5.2.2  Select  the  Training  Set 

The  selection  of  the  training  set  for  neural  paradigms  is  the  most  critical  decision  that  affects  the 
final  outcome.  While  it  is  easy  to  say,  the  set  must  represent  the  total  range  of  inputs  in  a  relative 
density  of  occurrences  to  represent  the  final  desired  results.  This  is  not  easy  to  accomplish,  since 
the  n-dimension  volume  of  the  total  space  is  impossible  to  define  exactly,  choice  of  examples  for 
training  (and  testing)  is  difficult. 

A  training  set  should  be  assembled  as  a  subset  of  the  total  data  set.  In  a  real  sense  all  the  data 
assembled  is  a  potential  candidate  as  a  training  set.  Including  all  this  data,  however,  will 
profoundly  affect  the  training  time,  and  the  cost  of  the  project. 

The  training  set  can  be  considerably  smaller  than  the  total  data  set  and  should  be  chosen  to  achieve 
generalization  across  the  various  classes  of  the  problem.  The  training  set  should  represent  the  key 
features  of  the  problem.  A  representative  set  should  cover  the  breadth  of  the  problem  to  be 
solved.  For  example  in  a  pattern  recognition  problem,  the  set  should  cover  the  range  of  problems 
in  die  classes  of  images  In  a  decision  or  control  problem,  it  should  cover  all  the  significant  cases. 

In  some  cases  it  is  possible  to  partition  the  training  set  into  routine,  difficult  and  border  line  cases. 
This  partition  will  be  most  usef^ul  in  determining  convergence  conditions  and,  in  particular,  lack  of 
training  convergence,  if  this  occurs. 

5.2.3  Select  a  Test  Set 

The  test  set  should  provide  evidence  that  the  system  will  be  useful  to  the  customer.  The  customer 
and  the  contractor  share  a  responsibility  to  ensure  that  this  set  reflects  the  customer’s  perception  of 
an  adequate  test,  and  the  contractor's  technical  understanding  of  the  relationship  between  the 
training  and  the  test  conditions.  The  test  set  should  reflect  the  distribution  of  input  vectors  similar 
to  the  training  set.  A  test  set  with  parameters  outside  the  training  set  can  lead  to  failed  tests. 

5.3  Choose  the  Development  System 

Experience  suggests  that  the  choice  of  the  development  platform  will  have  the  most  profound 
affects  on  the  success  of  the  project  A  wide  variety  of  software  development  systems  have 
appeared  over  the  last  few  years,  some  of  which  are  useful  as  experimental  learning  tools  for  a 
University  Laboratory,  and  others  which  provide  a  leaner  environment  for  the  skilled 
professional.  The  development  system  includes  the  software  simulator,  the  operating  system  and 
the  hardware  platform. 

Many  simulation  platforms  are  now  available  to  facilitate  the  process  of  training,  testing, 
debugging,  and  creating  and  displaying  the  system  and  human  interfaces.  These  platforms  operat.', 
on  a  variety  of  workstations  and  PCs.  The  characteristics  of  the  platform  will  profoundly  affect 
the  level  of  effort  needed  to  set  up,  train,  debug  and  test  the  .system.  Depending  on  the  financial 
resources  committed  to  the  project,  a  system  should  be  judged  based  on; 

The  Vendor 

Support  and  Training 
Fielded  Systems 
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Price 

Functionality 

The  Neural  Paradigms  Implemented 
User-Programmable  Paradigms 
Training  and  Debugging  FaciUties 
Interfaces 

Graphics  and  Displays 
System  software  Interfaces 
Graphics  Interfaces 
Database  Interface 
I^guage  Interfaces 
Support  Platforms  Needed  or  Required 
Operating  System 
Hardware  Platform 
Multiple  Screen  Processing  (Windows) 

The  relative  importance  of  these  factors  will  depend  on  the  project  and  the  team's  experience.  The 
final  system  can  either  operate  in  a  stand-alone  mode  or  be  part  of  a  larger  system.  In  either  case, 
the  development  environment  may  have  to  be  suitably  modified  in  order  to  integrate  the 
operational  network  into  the  final  system,  thus  portability  may  also  be  an  important  consideration. 
Finally  in  the  choice  of  a  new  system,  the  whole  system  capability  should  be  carefully  traded-off 
against  not  only  the  learning  curve  required  to  begin  work,  but  the  learning  curve  to  become 
really  proficient. 

5.4  Training  the  Network 

5.4.1  Training  Phases 

Many  training  paradigms  have  a  ‘•equence  of  phases.  In  each  phase,  the  training  parameters  can  be 
optimally  adjusted  to  speed  the  process. 

5.4.2  Selecting  Training  Parameters 

Once  the  paradigm,  the  structure  of  the  neuron  and  the  network  topology  have  been  decided,  a 
choice  of  training  parameters  is  required.  In  backpropagation  training,  for  example,  the  initial 
weights,  the  learning  rate,  and  the  momentum  must  be  selected  before  training  begins.  The 
implementation  team  should  have  considered  the  choice  of  these  parameters  and  if  appropriate 
considered  the  variation  of  parameters  as  training  proceeds  and  convergence  begins  to  occur  (or 
otherwise). 

5.4.3  Convergence  and  Nonconvergence 

Despite  the  theoretical  proofs  of  convergence,  experience  suggests  that  neural  training  often  results 
in  a  hung  situation  in  which  the  network  will  not  converge.  This  can  be  caused  by  many  factors: 
for  example,  a  poor  choice  of  the  training  set,  inappropriate  training  parameters,  a  stabilization 
occurring  in  a  local  minima,  by  overtraining  some  of  the  neurons,  or  by  network  paralysis.  Aside 
from  experience,  which  might  suggest  corrective  approaches,  it  is  in  this  situation  that  a  powerful 
simulation  platform  to  assist  in  the  debugging  will  be  most  appreciated. 

5.5  Debug  and  Test 

In  the  broadest  sense,  the  test  set  is  chosen  to  achieve  some  level  of  acceptable  response..  As 
di.scussed  in  Section  5.1.1,  the  test  set  should  be  partitioned  into  routine,  difficult  and  boundary 
cases. 

This  approach  is  termed  'black  box'  testing  and  is  the  most  common.  There  are  however,  other 
testing  procedures  which  are  important  in  some  cases  and  can  be  important  indicators  of  bad 
training  and/or  redundant  layers  or  nodes.  Some  simulation  packages  for  example  permit  the 
viewing  of  the  weights,  and  the  on-line  computation  and  presentation  of  enors,  etc. 

Each  of  these  approaches  will  be  discused  in  the  following.  Finally,  there  is  the  question  of  what 
to  do  if  the  network  fails  the  acceptance  tests.  This  will  be  discussed  in  the  final  section. 
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5.5.1  The  Black  Box  Approach 

Generally  a  neural  net  is  tested  by  comparing  input  and  output  for  the  appropriate  response.  The 
network  remains  essentially  a  black  box  with  its  internal  coding  and  data  transformations  being 
undecipherable.  Under  these  conditions  the  test  set  should,  if  possible,  be  divided  into  easy, 
difficult  and  boundary  sets.  Acceptance  criteria  should  be  developed  for  each  set.  Attention 
should  be  paid  to  comparisons  to  human  responses  when  this  is  possible.  Finally,  in  boundary 
cases,  it  is  useful  to  predefine  the  threshold  acceptance  level  of  the  output  responses. 

5.5.2  Node  and  Layer  Redundancy 

Eliminating  nodes  and  indeed  whole  layers  can  substantially  reduce  the  computational  complexity 
of  the  final  system.  In  some  cases  removing  these  redundancies  will  increase  the  convergence  of 
the  training  process. 

By  examining  the  weights  of  each  node,  those  with  low  values  of  weights  make  a  negligible 
contribution  to  the  final  output  and  can  likely  be  eliminated.  Such  pruning  should  be  followed  by 
continued  training  to  determine  if  an  improved  accuracy  can  be  achieved,  or  to  insure  that  the 
incremental  contribution  that  has  been  removed  is  restored. 

A  rule  of  thumb  suggests  that  weights  below  about  0.1  are  probably  redundant  and  can  be 
removed. 

At  the  other  extreme,  nodes  that  have  weights  much  in  excess  of  others  should  be  suspect,  for  it 
may  indicate  over  training  and  contribute  to  a  lack  of  generalization.  Such  a  situation  may  suggest 
a  repeat  of  the  training  process,  or  indicate  that  some  of  the  test  set  will  fail. 

5.5.3  Input  Node  Activation  Sensitivity 

In  some  applications  it  is  possible  to  determine  inappropriate  behaviour  by  carefully  selecting  a  test 
set  to  positively  reinforce  an  expected  output  at  a  given  node. 


5.5.4  Responses  to  Failed  Test-Procedures 

If  after  successful  training,  a  network  fails  to  respond  to  the  test  set  with  acceptable  results,  there  is 
a  whole  sequence  of  considerations  that  must  be  considered  in  a  rational  order  by  the  design  team. 
In  general  these  are: 

The  training  and  test  set 
The  learning  algorithm 
The  network  design 
The  system  interfaces 

Training  and  Test  Sets:  The  first  thoughts  are  about  the  training  and  test  set.  The  test  and 
training  set  should  be  re-examined  for  quality,  representativeness  and  accuracy.  The  training  set 
must  be  chosen  with  the  same  characteristics  as  the  training  set.  A  test  set  with  input  members 
different  from  the  training  set  will  invariably  lead  to  testing  failures. 

The  Learning  Algorithm:  The  learning  algorithm  constants  should  be  examined. 

The  Network  Design:  The  network  nodes  characteristics,  architecture  and  connectivity. 

The  System  Interfaces:  All  interfaces  should  be  examined  including  those  between  the 
training  set  and  the  network,  the  user  and  the  network  and  any  other  interconnected  software. 
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6.0  DELIVERY  AND  MAINTENANCE 


6.1  Introduction 

This  phase  of  the  project  resembles  any  software  delivery  phase.  The  principle  difference  is  the 
mechanisms  for  demonstrating  performance  and  functionality. 

6.2  The  Acceptance  Test  Plan 

The  functionality  and  performance  tests  should  have  been  formulated  as  part  of  the  requirements 
analysis.  The  plan  to  satisfy  these  requirements  should  have  been  formulated  and  accepted  at  the 
preliminary  design  review. 

6.3  System  Integration 

Neural  networks  tend  to  be  regarded  as  stand-alone  systems,  however,  they  can  be  integrated  into 
an  overall  system  in  two  configurations:  loosely  coupled,  or  tightly  coupled. 

Loosely  coupled  neural  networks  (probably  the  most  common  at  this  time)  communicate  by 
passing  data  files.  Such  systems  function  either  as  preprocessors,  post  processors,  or  as  a 
distributed  system.  Preprocessing  networks  prepare  data  for  examination  or  processing  by  other 
software  modules.  Post  processing  networks  are  often  used  to  remove  noise,  classify  patterns  or 
make  predictions.  A  distributed  system  passes  data  to  a  neural  net  for  analyses  or  to  interface  to 
another  system. 

Tightly  coupled  systems  are  more  full  integrated,  relying  on  data  sharing  to  pass  data.  In  a  fully 
integrate  the  neural  net  tends  to  loose  its  identity  and  become  another  module  in  a  larger  system. 

6.4  System  Performance  Evaluation 

The  evaluation  of  performance  usually  includes  both  functionality  and  computational  complexity. 

Performance  is  judged  by  the  successful  treatment  of  the  testing  data  set.  Computational 
complexity  includes  both  memory  requirement  of  the  program  and  the  time  needed  to  fulfill  its 
function. 

6.5  Maintenance  Plan 

The  maintenance  plan  should  consider  three  major  issues:  Environmental  Modifications, 
Structural  Modifications  and  Interface  Modifications. 

Environmental  modifications  suggests  that  the  character  of  the  the  input  data  has  changed. 
This  could  occur  from  a  wide  variety  of  causes,  however,  the  result  is  the  need  for  a  redefinition 
of  a  training  set  and  the  complete  retraining  of  the  network. 

Structural  modifications  suggests  that  the  role  of  the  neural  net  is  found  to  be  unsatisfactory 
or  it  must  he  changed  to  accommodate  newer  roles  in  the  system.  A  sliuciural  modification 
suggests  in  the  worst  case  a  complete  reconsideration  of  the  design  methodology  or  at  best  a 
reorganizing  and  training  of  the  exi.sting  network. 

Interface  modifications  suggests  software  changes  to  the  human  or  system  interfaces.  These 
could  occur  from  the  need  for  more  useful  data  presentation  to  humans,  changes  to  the  format  of 
the  data  base,  or  protocol  changes  to  the  surrounding  system.  Depending  on  the  original 
requirements,  plans  should  be  produced  to  show  how  these  potential  exigencies  will  be 
approached. 

We  note  that  a  maintenance  plan,  as  in  other  software  projects,  depends  on  a  document  set  that  will 
support  the  actions  needed  over  the  life  of  the  system. 
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7.0  PROJECT  MANAGEMENT 


7.1  Introduction 

In  this  section  the  basic  project  development  methodology  will  be  pulled  together  to  exhibit  a 
management  approach. 

7.2  Management  Overview 

A  characteristic  of  any  design  methodology  is  the  intrinsic  mechanisms  for  project  monitoring  and 
control.  Project  plans  and  methodologies  all  exhibit  some  structure  and  modularity  at  which  states 
can  be  measured  and  either  forward  or  backward  influence  exerted  to  ensure  convergence  on  the 
original  objectives  or  to  adapt  to  changes  caused  by  influences  outside  the  project,  or  by 
unforeseen  events  occurring  in  the  project.  These  points  in  the  methodology  are  characterized  by 
milestones  which  occur  at  the  conclusion  of  a  predictable  set  of  activities  at  which  reports  can  be 
prepared  and  progress  measured.  Milestones  also  are  points  at  which  control  can  be  exerted  to 
account  for  slippages  caused  by  inappropriate  predictions  of  the  level  of  effort,  the  level  of 
difficulty,  or  by  changes  in  the  requirements. 

In  general  there  is  a  distinction  preserved  between  major  and  minor  milestones.  There  are  many 
ways  of  defining  such  a  distinction  depending  on  the  environment  and  the  project.  For  our 
purposes  a  major  milestone  signifies  a  point  in  the  design  process  were  a  significant  goal  has  been 
achieved;  typically  the  completion  of  a  set  of  tasks  marking  a  logically  complete  step  in  the  overall 
process.  A  useful  criteria  is  to  assume  that  at  a  major  milestone,  a  different  team  will  take  over 
the  project  and  must  be  provided  with  a  set  of  specifications  for  their  task.  Thus,  major  milestones 
are  points  in  the  project  where  significant  documentation  and  evaluation  occurs.  Minor  milestones 
are  important  events  which  are  typically  part  of  a  larger  logical  task. 

Staffing  a  neural  network  project  depends  on  many  factors.  Project  Leaders  should  have 
experience  in  software  development,  and  preferably  a  working  knowledge  of  the  capabilities  and 
limitations  of  neural  computing  paradigms.  Since  a  wide  variety  of  skills  are  necessary,  it  is  not 
necessary  for  all  team  members  to  be  experts  in  neural  computing,  however,  one  team  member 
should  have  the  capability  to  do  the  system  analysis,  and  to  determine  the  appropriate  neural  net 
paradigm,  as  well  as  to  judge  the  training  and  test  sets.  Programmers  with  conventional  skills  may 
be  required  depending  on  the  human  and  system  interfacing  requirements.  Finally  since  vast 
amounts  of  data  are  usually  required,  experience  with  data  structures  and  data  manipulative 
software  is  very  useful. 

The  methodology  as  outlined  contains  such  milestones,  and  the  issues  that  should  be  addressed  and 
activities  that  should  be  executed,  and  hence  the  reports  that  can  be  prepared  by  the  project  team  to 
provide  evidence  of  reaching  the  milestone  (or  otherwise).  In  this  section  we  will  outline  these 
milestones  and  suggest  the  form  of  reporting  and  actions  that  are  appropriate  for  managers.. 

7.3  Project  Milestone  Reviews 
7.3.1  Major  Milestones 

The  proposed  methodology  is  characterized  by  four  major  milestones:  Requirements  Analysis, 
Logical  Design,  Implementation,  and  Integration  and  Maintenance.  These  mark  major  milestones 
in  the  project  life-cycle.  Each  is  characterized  by  attaining  certain  goals  and  each  can  be  validated 
by  determining  progress  and  achievements  as  discussed  in  the  previous  sections.  Essentially  at  each 
major  milestone,  the  project  team  can  be  required  to  prepare  reports  outlining  how  each  issue  has 
been  addressed  and  the  reasons  for  the  choice  of  each  alternative.  The  project  leader  is  typically 
charged  with  the  responsibility  of  sign-off  for  the  document  set  attesting  to  the  conclusion  of  the 
milestone. 

Major  Milestone  #1:  Requirements  Analysis 

In  typical  software  projects,  the  requirements  analysis  is  completed  by  the  customer's  team  based 
on  the  perceived  needs  of  the  end-user  and  results  in  a  set  of  requirements  specifications  (the  A- 
Level  Specifications).  The  design  team  has  the  problem  of  analyzing  these  specifications  and 
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reducing  these  to  a  document  set  suitable  for  their  purposes.  In  either  case,  evidence  should  be 
presented  that  the  following  questions  have  been  answered  or  the  issues  have  been  addressed: 

What  is: 

The  required  functionality? 

The  performance? 

The  human  and  system  interface  definitions? 

The  acceptance  tests? 

The  operational  constraints? 

The  non-functional  constraints  on  the  final  system? 

The  maintenance  and  documentation  requirements? 

Finally  "Is  there  an  adequate  supply  of  useable  data  for  training  and  testing?" 

Major  Milestone  #  2:  Logical  Design 

The  conduct  of  this  phase  has  been  outhned  in  Section  4.  The  conclusion  of  this  phase  is  marked 
by  the  presentation  of  evidence  to  demonstrate  either  the  consideration  and/or  attainment  of  the 
following: 

1.  The  application  has  been  confirmed,  in  the  sense  that  the  team  is  confident  that  a  neural 
computing  approach  wilt  yield  results. 

2.  The  neural  paradigm  has  been  selected  including  estimates  or  starting  data  for:  the 
network  paradigrn(s)  including  the  network  size,  the  output  type,  and  the  training 
method.  Time  Constraints 

3  The  network  alternatives  have  been  designed,  including  the  node  level,  and  the  network 
level.  The  training  issues  and  parameters  have  been  considered  and  allocated. 

Major  Milestone  #  3:  Implementation 

The  level  of  effort  and  the  amount  of  time  required  to  implement  the  neural  net  is  very  difficult  to 
predict,  due  to  the  uncertainty  of  the  training  procedures.  It  is,  however,  during  this  phase  that 
detailed  management  is  required  to  ensure  that  the  project  is  converging  and  that  appropriate  steps 
are  being  taken  to  maintain  estimates  of  time  and  effort. 

The  expected  result  is,  of  course,  that  a  trained  neural  net  is  presented  as  evidence  of  success, 
however,  the  team  should  at  minor  milestones  present  evidence  that  they  have: 

1  Characterized  the  input  data  set  by  a.ssembling  and  preparing  the  input  data  set,  selected  the 
training  and  test  set. 

2.  Chosen  the  development  system,  including  the  operating  system  and  the  hardware  platform. 

3.  Achieved  training  to  the  standards  required  and  successfully  passed  the  preliminary  test 
requirements. 

4.  Created  all  the  user  and  system  interfaces,  and  have  completed  any  test  and  demonstration 
required  by  the  original  specifications. 

The  final  requirement  in  this  phase  is  the  preparation  of  all  the  deliverable  documentation,  and  the 
preparation  of  the  demonstrations  required  by  the  factory  acceptance  tests. 

If  an  integration  of  the  system  into  a  larger  environment  is  required  this  should  be  completed  and 
tested  on-site,  if  possible.  If  the  deliver)'  requires  integration  into  a  system  on  the  customer's  site, 
the  interface  documents  should  be  carefully  pursued  and  the  integration  plan  finalized. 

Major  Milestone  #4:  Delivery  and  Maintenance 

This  phase  should  mark  sign-off  of  the  project.  The  delivery  of  the  required  document  set,  and  the 
test  plan  for  acceptance  should  be  prepared. 
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7.3.2  Preliminary  Design  Review 

A  preliminary  design  review  should  occur  after  the  finish  of  the  logical  design  phase.  At  this  point 
the  complete  logical  plan  for  attacking  the  problem  should  be  in  place. 

7.3.3  Critical  Design  Review 

The  critical  design  review  should  be  held  part  way  through  Milestone  3,  at  the  point  where  the 
major  decisions  have  been  made  with  respect  to  platforms  and  the  detailed  training  plans  are  in 
place. 


7.3.4  Pre-Delivery  Preparation 

Predelivery  activity  should  include  the  instantiation  of  the  user  and  system  interfaces,  as  required, 
as  well  as  the  pretesting  preparation  for  the  final  test.  The  documentation  should  be  subject  to 
quality  assurance,  and  the  list  of  deliverables  checked-off. 

7.4  Documentation  and  Configuration  Control 

It  is  truism  to  say  that  configuration  control  (or  the  lack  of  it)  has  contributed  to  more  failures  and 
cost  overruns  in  software  projects  than  any  other  single  cause.  In  neural  engineering  the  need  for 
configuration  control  is  even  more  urgent.  In  addition  to  the  normal  software  documentation,  a 
complete  record  of  the  training  portion  of  the  project  becomes  critical  in  judging  progress,  and 
maintaining  convergence  in  time. 

Of  particuh.'  importance  is  a  detailed  record  of  each  parameter,  and  the  changes  effected  during 
training  runs,  the  number  of  iterations,  the  rate  of  convergence,  and  any  other  tuning  efforts  based 
on  the  heuristics  of  the  team. 

This  record  will  prove  valuable  not  only  in  situations  where  convergence  is  slow  to  occur,  but  will 
be  essential  in  post-delivery  if  modifications  are  to  be  effected  in  the  field. 

7.5  Disasters  -  Recovery  and  Containment 

During  phase  three,  a  host  of  difficult  problems  can  arise  that  require  experience  and  often  basic 
knowledge  of  neural  paradigms  to  surmount.  The  response  to  these  problems  will  be  based  on  the 
quality  of  the  implementation  team. 

In  addition  there  may  be  fundamental  problems  which  are  caused  by  decisions  made  during  earlier 
phases  of  the  design  methodology.  Some  of  these  may  be  very  difficult  to  fix  'on-the-fly'  and  can 
contribute  to  project  failure  or,  at  best,  over-runs  of  time  and  money.  These  pathological  errors 
may  occur  at  any  stage  of  the  project  and  can  be  classified  as  either  global  or  detailed.  Global 
errors  occur  during  the  logical  design  phase,  and  usually  indicate  a  restart  of  the  project. 

Global  errors  include; 

Wrong  choice  of  neural  paradigms 
Wrong  choice  of  test  and  training  data 
Wrong  Simulation  System 
Inadequate  Performance 

Detailed  errors  refer  to  inappropriate  choices  made  during  the  implementation  phase,  and  are 
manifested  as  lack  of  cx)nvergence,  lack  of  performance,  or  failure  of  the  test  set. 


3-16 


8.0  SUMMARY  AND  CONCLUSIONS 


8.1  Summary 

The  Management  Structure: 

The  thesis  underlying  this  presentation  has  been  that  a  project  involving  the  development  of  a 
neural  computing  system  is  subject  to  the  same  general  rules  of  conduct  as  any  other  software 
project.  The  management  structure  must  account  for  the  special  features  of  such  project,  since 
there  is  considerable  difference  in  detail  in  arriving  at  a  successful  neural  network  compared  to 
other  more  familiar  software  systems.  The  management  structure  and  the  managers  must 
therefore  be  familiar  with  these  difference  and  the  issues  which  they  raise  if  they  are  to  understand 
the  importance  of  addressing  these  issues  and  the  alternatives  which  should  be  considered. 

The  Methodology: 

The  methodology  proposed  follows  the  classical  four  major  activities  of:  Requirements  Analysis, 
Logical  Design,  Implementation,  and  finally  Delivery  and  Maintenance.  Each  of  these  mark  a 
major  milestone  in  a  project  and  each  has  a  set  of  activities,  issues  and  specialized  expertize 
required  to  successfully  traverse  the  required  activities.  And  most  importantly,  each  can  be 
reported  by  preparing  a  document  set  which  addresses  the  relevant  issues,  and  the  solutions  and/or 
alternatives  found  necessary  to  project  a  successful  conclusion.  This  reduces  the  project  and  its 
management  to  a  an  understood  set  of  activities  which  are  close  to  those  normally  found  in  a 
software  project.  There  are  however  some  critical  differences. 

Project  Differences:  There  are  perhaps  four  major  differences  with  more  conventional  software 
projects:  First  the  success  is  dependent  on  the  availability  of  existing  data;  Second,  the  logical 
design  consists  of  choosing  the  set  of  paradigms  most  likely  to  yield  success  (rather  than  evolving  a 
systems  algorithm);  Third,  there  is  no  guarantee  that  the  training  regimes  will  converge,  and  that 
the  test  sets  will  provide  the  evidence  necessary  for  validation,  and  finally,  the  documentation  and 
configuration  management  must  be  adapted  to  the  neural  paradigm. 

Software  Project  Failures;  Failures  generally  tend  to  be  based  very  intangible  details  of  the 
training  and  test  sets,  and  the  wrong  choice  of  paradigm,  and/or  the  lack  of  convergence  of  the 
training  regime.  A  whole  new  set  of  skills  and  techniques  are  necessary  to  rescue  a  failing  project. 

Selecting  the  Paradigm;  The  field  of  neural  computing  is  in  a  rapid  state  of  expansion,  however, 
there  arc  now  what  could  be  called  classical  approaches.  It  should  be  possible  to  map  an 
application  onto  a  set  of  potential  paradigms  with  a  fair  degree  of  confidence. 

Selecting  the  Simulation  Platform:  The  simulation  platform  is  a  critical  choice,  as  in  most 
software  projects.  Aside  from  the  choice  of  paradigms,  the  most  citical  item  is  assistance  in 
influencing  the  training  regimes,  monitoring  the  training  progress  and,  if  necessary,  in  debugging 
the  failed  system. 

Training  and  Test  Set  Selection;  The  most  critical  concern  of  both  the  contractor  and  the  customer 
should  be  the  selection  of  the  training  and  the  test  sets.  The  training  set  must  reflect  the  full  scope 
of  potential  inputs  to  the  final  system,  and  the  test  set  must  reflect  the  structure  of  the  system  as 
learned  through  the  training  set.  Failure  to  select  these  sets  will  cause  project  failure,  in  most  cases 
for  the  wrong  reasons.. 

Training  Failures;  The  most  distressing  feature  of  neural  computing  is  the  lack  of  convergence  of 
the  training  procedure.  As  discussed  there  are  many  causes,  and  the  field  is  rife  with  heuristics 
and  procedures  for  rescuing  the  situation.  The  main  recourse  is  the  experience  of  the  team,  a 
good  simulation  platform,  and  sometimes  raw  luck. 
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8.2  Conclusions 

The  Neural  Computing  Field: 

At  the  present  time,  the  movement  of  the  theories  developed  by  neural  science  to  the  practice  of 
neural  engineering  is  progressing  along  the  same  line  that  software  did  twenty  years  ago,  and  in 
the  same  manner  that  expert  systems  software  did  over  the  last  decade.  The  field  is  in  a 
tremendous  growth  phase  with  new  theories,  implementation  platforms,  and  successful  applications 
appearing  daily.  In  addition  there  is  a  certain  amount  of  the  baidwagon  syndrome  appearing  in 
the  commercial  world,  resulting  in  many  claims  of  neural  computing  expertise  based  on  very 
limited  experience  with  the  realities  of  hard  experience.  In  such  an  environment  it  is  easy  to  be 
misinformed  and  misdirected.  Many  of  the  developed  systems  have  been  level-of-effort  projects 
with  an  open  budget;  this  situation  must  evolve  towards  a  more  commercial  development  project 
with  the  standard  management,  documentation  and  control  procedures,  if  the  field  is  to  mature  into 
a  professional  discipline. 

Software  or  Hardware: 

In  the  end,  a  neural  network  will  be  considered  as  one  of  the  alternatives  to  solving  a  problem.  Its 
inclusion  in  a  hybrid  system  [C-1,2]  composed  of  classical  algorithmic  software  and  rule-based 
software  will  depend  on  the  nature  of  the  problem  and  the  capabilities  of  the  different  paradigms. 
This  trend  is  already  noticeable  in  hybrid  systems  of  algorithms  and  rule-based  systems.  The 
implementation  of  neural  paradigms  will  follow  the  path  of  special  purpose  software  which  has 
been  relegated  to  microcode  for  such  applications  as  input/output  drivers,  and  will  be  implemented 
on  special  purpose  coprocessor  boards. 

Contract  Award: 

If  contracts  are  to  be  let,  a  review  of  the  design  methodology  and  the  project  management  plan 
should  be  an  integral  part  of  the  assessment  procedure  of  the  received  bids.  While  there  is  still  a 
level  of  uncertainty  in  the  convergence  of  most  neural  network  projects,  there  are  good 
engineering  design  approaches  which  will  minimize  this  risk  and  often  contribute  to  the  success  of 
the  whole  project. 

Project  Management: 

In  a  larger  view  of  the  development  of  a  software  solution,  the  need  for  a  neural  network  would 
evolve,  as  part  of  the  logical  systems  design,  in  response  to  the  demands  on  the  functionality,  the 
input  data,  and  other  knowledge.  The  methodology  proposed  here  has  begun  with  the  implicit 
assumption  that  a  neural  solution  has  been  decided  upon.  It,  never  the  less,  proposed  a  distinct  set 
of  phases  in  which  progress  can  be  measured,  and  issues  faced  at  the  appropriate  time.  This 
certainly  provides  management  control  and  leaves  the  development  team  with  a  set  of  guidelines 
for  ensuring  that  all  options  are  explored  in  a  systematic  manner.  It  also  suggests  when  things  may 
be  going  astray  and  convergence  may  not  be  occurring.  The  steps  proposed  may  be  traversed 
quickly  with  an  experienced  project  team  on  familiar  ground,  however,  an  awareness  on  the  part 
of  the  team  and  management  of  the  logical  sequence  of  considerations  and  issues  lends  order  and 
structure  to  the  project. 

The  Near  Future: 

The  neural  computing  field  is  in  a  state  of  rapid  change;,  in  theories,  in  new  architectures,  and  in 
computational  platforms  (software  and  hardware).  Design  and  implementation  teams  will  need  a 
constant  infusion  of  updated  concepts  and  information  to  make  full  u.se  of  this  technology.  This 
statement  is  also  true  for  those  with  applications  that  might  benefit  from  the  use  of  this  technology. 

Finally  we  have  not  considered  the  development  of  neural  paradigm  using  new  hardware  neurons 
[D- 1,2,3].  It  is  clear  that  large,  high  performance  neural  networks  will  be  implemented  using  a 
variety  of  silicon  or  perhaps  optical  (perhaps,  even  biological)  devices  to  simulate  the  neuron. 
Some  of  these  will  be  trainable  and  some  will  accept  weights  derived  from  software  simulation. 
This  combination  will  offer  an  interesting  challenge  as  the  software  and  hardware  engineers  join 
their  methodological  requirements  to  achieve  very  large  neural  networks. 
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Table  1:  Characteristics  of  Neural  Paradigms 
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Summary 

The  computational  complexity  of  a  processing 
function  is  a  driving  factor  in  the  implementation 
of  that  function  in  an  operational  system.  Artificial 
neural  networks  offer  the  potential  for  significant 
improvements  in  the  computational  complexity  of  a 
number  of  guidance  and  control  functions.  To 
illustrate  such  an  improvement,  this  paper 
considers  a  comparison  between  two  different 
approaches  to  object  detection  and  recognition  a 
traditional  approach  employing  a  wide  field  of  view 
and  constant  spatial  resolution  throughout  the 
image  sensing  and  processing  chain,  and  a  foveal 
approach  utilizing  a  roving  “eyeball”  circularly 
symmetric  sampling  grid  with  a  radially  variant 
resolution  in  the  processing  chain.  The  rationale 
and  characteristics  of  these  two  approaches  are 
described  aind  compared.  Quantitative  evaluations 
of  the  processing  loads  and  data  transfer  rates  are 
then  carried  out  for  both  approaches.  These 
processing  requirements  are  then  compared  and  the 
operational  implications  of  this  comparison  are 
discussed.  While  this  paper  does  not  explicitly 
discuss  the  efficacy  ot  the  foveal  approach, 
references  to  relevant  research  results  in  this  regard 
are  provided. 

1  Introduction 

Object  detection  and  recognition  are  image 
analysis  operations  that  are  of  central  importance 
for  the  guidance  and  control  of  many  types  of 
modern  weapons.  Unfortunately,  except  for  the 
simplest  types  of  objects  (e.g.,  “hot  blobs”  in 
infrared  and  radar  imagery)  and  the  simplest 
operational  scenarios,  the  computational  and  data 
transfer  requirements  connected  with  the.se 
operations  are  orders  of  magnitude  beyond  current 
real  time  on-board  processing  capabilities.  Thus, 


even  though  competent  image  object  detection  and 
recognition  systems  can  be  built,  most  such 
systems  cannot  be  employed  for  guidance  and 
control  because  of  size,  mass,  power,  and  cost 
limitations.  What  is  needed  is  a  new  approach  that 
can  significantly  reduce  the  computational  burden 
and  data  transfer  requirements  associated  with 
object  detection  and  recognition.. 

Most  current  approaches  to  image  object 
detection  and  recognition  employ  sensors  that  are 
designed  following  the  tradition  of  television.  This 
is  logical,  since  an  enormous  technological 
infrastructure  exists  for  such  devices.  However, 
most  current  systems  continue  the  analogy  all  the 
way  through  the  entire  processing  chain.  In  other 
words,  at  each  stage  of  processing,  the  image  pixels 
or  features  that  are  used  are  sampled  at  regular 
intervals  across  the  entire  image  or  subimage. 

While  this  seems  particularly  natural  (because  of 
our  television  mentality),  it  is  not  necessarily  an 
optimal  or  cost-effective  approach  for  object 
detection  and  recognition.  In  fact,  this  approach 
clearly  ignores  the  design  principles  employed  in 
biological  vision  systems. 

Unlike  television  systems,  the  visual  systems  of 
animals  are  optimized  for  object  detection  and 
recognition  —  not  for  image  rendering.  No 
example  of  a  constant  resolution  image  sensor  or 
image  processing  system  exists  among  the 
vertebrates  (some  insects  have  such  systems). 
Vertebrate  animal  visual  .systems  are  based  upon 
foveal  senr.ors  and  foveal  processing.  Such  systems 
provide  the  advantage  of  high  visual  acuity  within 
a  small  central  field,  with  resolution  that  drops  off 
rapidly  with  radial  distance  from  the  center.-  Such 
foveal  vision  systems  must  employ  eyeballs  to  allow 
the  high  central  resolution  of  the  foveal  sensor  to 
be  rapidly  moved  to  different  locations  within  the 
scene.  The  primary  thesis  of  this  paper  is  the  claim 
that  military  object  detection  and  recognition 
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systems  built  upon  this  foveal  eyeball  concept 
deserve  intensive  investigation. 

The  next  section  provides  detailed  descriptions  of 
two  different  image  object  detection  and 
recognition  architectures:  a  constant  resolution 
architecture,  and  a  foveal  architecture.  In 
Section  3,  these  two  architectures  are  compared. 
Finally,  in  Section  4,  the  potential  military 
operational  implications  of  this  comparison  are 
discussed. 

2  Two  Architectures 

In  this  section,  the  designs  for  two  hypothetical 
object  detection  and  recognition  systems  (a 
constant  resolution  system  and  a  foveal  system)  are 
discussed.  To  focus  the  discussion,  we  shall  a-ssume 
an  image-based  object  detection  and  classification 
system  having  a  1024  x  1024  pixel  imaging  sensor 
looking  down  at  the  ground  obliquely  from  an 
airborne  platform  which  always  flys  at  about  the 
same  altitude  above  the  ground.  It  is  further 
assumed  that  the  range  is  such  that  the  number  of 
pixels  on  each  object  is  reasonably  large.  The 
analysis  in  this  section  will  concentrate  on 
estimating  the  processing  required  to  carry  out 
object  detection  and  classification  for  a  single 
frame  of  this  imagery.  It  is  assumed  that  there  are 
40  object  classes  of  interest  and  an  average  of  12 
objects  per  frame.  The  next  section  compares  the 
results  obtained  in  this  section  for  the  two  system 
concepts. 

2.1  A  Constant  Resolution  System 

This  subsection  describes  an  object  detection  and 
recognition  system  concept  that  uses  constant 
resolution  imagery  and  constant  resohuion 
processing.  The  .system  employs  a  two-stage 
processing  approach  to  reduce  tiie  computational 
burden  while  maintaining  high  probability  of 
detection  and  classification  rates  (see  Figure  1). 
The  first  processing  stage  performs  object  detection 
using  a  small  number  of  features  conquited  across 
the  entire  image.  The  result  of  this  processing 
stage  is  a  set  of  potential  object  locations  At  each 
potential  object  location,  the  second  stage  of 
processing  eliminates  false  alarms  and  classifies  the 
true  objects.  This  stage  of  proie.ssmg  ti'e.s  a  largei 
number  of  features  lhan  the  first  stage 
We  begin  with  a  brief  discussion  of  the  features 
that  are  used  at  both  proce-ssing  levels.  Ne.xt,  the 
two  processing  stages  an;  described  in  detail. 


2.1.1  Feature  Extraction 

Both  the  primary  and  secondary  feature  extractors 
use  Gabor  logons  (see  Figure  2)  as  the  feature  set. 
Gabor  logons,  originally  introduceci  in  the  context 
of  uncertainty  theory  for  information  [9],  have  been 
widely  used  in  image  processing  and  machine  vision 
since  Daugman  extended  the  original  work  to 
two-dimensions  [8].  Examples  of  Gabor  logons  in 
image  processing  include  image  compression  [4], 
image  reconstruction  [13],  texture  segmentation  [4], 
feature  extraction  and  pattern  recognition  [3,19]. 

The  primary  advantage  of  Gabor  logons  is  that 
they  provide  local  spatial  frequency  information 
which  has  been  demonst-ated  to  be  sufficient  for 
many  types  of  object  detection  and  classification. 

A  logon  is  constructed  from  a  sinusoidal  grating 
function  weighted  by  a  two-dimensional  Gaussian. 
The  sinusoid  portion  of  the  logon  introduces  a 
“waviness”,  whereas  the  Gaussian  portion  localizes 
the  logon  to  a  region  of  the  image  that  surrounds 
the  location  corresponding  to  the  mean  of  the 
Gaussian.  The  extent  of  the  Gaussian  and 
subsequently  the  logon  is  determined  by  the 
variance  of  the  Gaussian.  The  mathematical  form 
of  a  logon  can  be  written  as 

C{x,y)  = 

where  (xo,t/o)  are  position  parameters  which 
localize  the  function  to  a  region  of  the  image, 
(«o,vo)  are  modulation  parameters  which  orient 
the  function  to  a  preferred  direction  and  spatial 
frequency,  and  (»,/?)  are  scale  parameters  which 
determine  the  spatial  extent  of  the  logon. 

As  demonstrated  in  [8],  the  two-dimensional 
Gabor  logons  are  not  orthogonal  functions. 
Therefore,  the  decomposition  of  an  image  into  a  set 
of  logon  coefficients  cannot  be  performed  by  simply 
projecting  the  image  onto  the  logons.  Daugman  [4] 
has  developed  a  neural  network-based  method  for 
decomposing  an  arbitrary  image  into  a  set  of 
logons.  This  method  uses  a  relaxation  process  to 
achieve  a  minimum  mean  squared  error  fit  of  the 
image  to  the  set  of  logons. 

While  this  method  works  well,  it  is  very 
lompiilatioiially  intensive.  Therefore,  the  object 
detection  and  recognition  systems  described  below 
use  the  projection  of  the  image  onto  each  logon. 
Tlie  “cross-talk”  in  the  resulting  logon  coefficients 
is  ignored 
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Figure  1-  Constant  resolution  image  object  detection  and  recognition  system  design. 


Cosine  logon 


Figure  2;  Spatial  frequency  detection  kernels  -  sine  and  cosine  Gabor  logon  wavelet  features. 
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bandwidth  between  the  first  and  second  stages. 
This  bandwidth  reduction  is  accomplished  while 
maintaining  a  low  object  miss  rate. 


o 

Figure  3;  Examples  of  geometric  shapes  used  for 
object  detection. 


2.1.2  Object  Detection 

The  first  stage  of  processing  is  object  detection. 
The  approach  [20]  consists  of  calculating  a  set  of 
low  resolution  Gabor  logon  coefficients  and 
comparing  these  coefficients  with  those  derived 
from  a  set  of  simple  geometric  shapes.  The  use  of 
low  resolution  logons  provides  resistance  to  noise 
and  background  clutter,  while  comparison  witli 
simple  geometric  shapes  reduces  false  alarms  In 
order  to  ensure  that  all  objects  in  the  image  are 
detected,  the  object  detei  'on  process  is  applied 
throughout  the  image  on  a  sampling  grid  of  every 
fourth  pixel 

In  Figure  1,  the  p  lary  feature  extractor 
calculates  16  complex  Gabor  coefficients 
corresponding  to  two  spatial  scales,  eight 
orientations,  and  two  phases  at  every  fourth  pixel 
location.  This  results  in  1,048,576  coefficients  and 
requires  36  billion  arithmetic  operations  per  image. 
These  coefficients  are  passed  to  the  object  detection 
module  which  compares  the  16  coefficients  at  each 
pixel  location  to  the  coefficients  derived  from  each 
of  five  geometric  shapes  (see  Figure  3). 

The  comparison  is  performed  using  a  normalized 
similarity  function  derived  from  [3]- 


<2(P)  .  l|g(/>)ll 


where  S{i,j)  is  the  similarity  function,  0{i,jj  is 
the  Gabor  feature  vector  at  point  (i,  in  tlie 
image  and  G{p)  is  the  Gabor  feature  vector  of  the 
matching  geometric  shape  This  similarity  function 
is  normalized  to  the  interval  (0,1) 

The  similarity  values  at  each  sampled  pixel  are 
then  compared  with  a  threshold  Those  pixels  with 
similarity  values  above  the  threshold  arc  considered 
potential  object  locations  and  are  pas.^ed  on  to  the 
second  stage  of  processing.  In  general,  a  very  large 
fraction  of  the  pixels  will  be  below  the  threshold 
and  therefore  will  not  be  processed  further, 
resulting  in  a  significant  reduction  in  procc.ssmg 


2.1.3  Object  Recognition 

The  second  stage  of  processing  is  object 
recognition.  The  processing  at  this  stage  consists  of 
extracting  a  number  of  higher  resolution  Gabor 
logon  coefficients  and  inputting  these  coefficients  to 
a  backpropagation  neural  network  [11].  The 
backpropagation  network  has  been  trained  to 
classify  its  input  into  one  of  the  40  object  classes  or 
the  “no  object”  class.  This  processing  ;.s  applied 
only  at  those  pixel  locations  that  were  above 
threshold  in  the  object  detection  stage.  The 
following  discussion  assumes  that  there  are  100 
such  points. 

At  each  potential  object  location,  the  secondary 
feature  extractor  calculates  56  complex  Gabor 
coefficients  corresponding  to  seven  spatial  scales, 
eight  orientations,  and  two  phases.  The  spatial 
scales  include  the  two  scales  used  in  the  detection 
processing  as  well  as  five  additional  higher 
resolution  scales.  In  addition  to  these  56 
coefficients,  the  secondary  feature  extractor 
calculates  56  coefficients  at  each  of  4  adjacent 
locations  for  a  total  of  280  coefficients.  These 
adjacent  locations  are  typically  within  a  few  tens  of 
pixels  of  the  potential  object  location,  and  result  in 
a  more  robust  classifier  that  is  insensitive  to  the 
precise  position  of  the  potential  object  location  on 
the  object. 

The  magnitude  of  each  complex  coefficient  is 
calculated  and  the  resulting  280-dimensional  vector 
is  presented  to  a  backpropagation  classification 
network.  This  network  is  trained  to  classify  its 
input  vector  into  one  of  40  object  classes  or  the 
not-an-object  class.  Through  training  on  actual 
examples  of  objects  and  false  alarms,  the  network 
is  able  to  achieve  a  low  false  alarm  rate  and  a  high 
probability  of  correct  classificaMon.  It  is  worth 
noting  that  this  constant  resolution  method  is  itself 
much  more  economical  from  a  computational 
standpoint  than  most  classical  approaches  which 
often  require  yet  another  order  of  magnitude  more 
processing  per  image. 

2.2  A  Foveal  Rosette  System 

In  the  recent  past,,  a  number  of  resear'-hers 
[14,15,16,17,18,19,10]  have  advocated  a 
fundamentally  new  approach  to  image  object 
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Figure  4:  A  Toveal  image  feature  sampling  pattern 
rosette  with  3  rings  and  16  spokes.  The  radius  of 
each  ring  is  twice  that  of  the  previous  ring.  Spa¬ 
tial  frequency  features  are  gathered  at  the  central 
fixation  point  of  the  rosette  and  at  the  points  of  in¬ 
tersection  of  the  rings  and  spokes.  For  many  ob¬ 
jects,  these  spatial  frequency  feature  sets  provide  a 
unique  signature  -  assuming  that  the  central  point  of 
the  rosette  is  placed  at  an  approximately  repeatable 
position  on  the  object.  Rules  and  neural  networks 
for  moving  the  rosette  (a  saccade  generation  system) 
ensure  that  the  rosette  moves  repeatably  to  similar 
fixation  points  on  similar  objects  in  different  images. 
A  saliency  detector  neural  network  can  be  u.sed  to 
determine  when  an  object-identification-relevant  fix¬ 
ation  point  (a  nexus  point)  has  been  found 


detection  and  classification.  This  approach,  which 
we  shall  call  “eyeball”  vision,  is  based  upon  a  crude 
analogy  with  mammalian  vision  systems  The  idea 
is  to  utilize  many  of  the  successful  methods  already 
developed  in  machine  vision  research,  and  modify 
these  methods  to  work  with  a  much  smaller  set  of 
multiresolution  wavelet  features  that  are  sampled 
in  a  non-uniform  foveal  pattern  (  ee  Figure  4). 

The  idea  of  foveal  sampling  is  that  of  having  an 
agile,  readily  movable  sensor  that  moves 
intelligently  from  fixed  point  to  fixed  point  in  the 
image  to  carry  out  the  object  detection, 
classification,  tracking,  and  measurement  functions. 

As  shown  by  Zeevi  [19],  Rybak  [14,15,16,17],  and 
von  der  Malsburg  [2,3,12],  the  operations  required 


to  carry  out  most  object  acquisition  and  object 
recognition  operations  in  images  can  be  carried  out 
using  a  relatively  small  ensemble  of  spatial 
frequency  and  image  intensity  features..  In  feict,  for 
a  foveated  image  sampling  pattern,  Rybak  [14] 
proposes  that  as  few  as  833  real-valued,  local  image 
features  are  sufficient  for  carrying  out  many 
practical  object  acquisition  and  recognition 
functions.  The  work  of  Zeevi  [19]  and  von  der 
Malsburg  [2,3,12]  supports  Rybak’s  conclusions.  In 
this  paper  we  will  discuss  a  slightly  modified 
version  of  Rybak’s  foveal  rosette  system  [14]. 

The  point  on  the  image  that  lies  at  the  center  of 
the  foveal  sampling  pattern  (the  rosette)  is  called  a 
fixation  point.  As  in  biological  vision,  the 
movement  of  the  rosette  from  one  fixation  point  to 
the  next  is  known  as  a  “saccade” .  Saccades  are 
generated  primarily  by  exploiting  feature  data 
gathered  at  the  spar.se  peripheral  sampling  points 
of  the  rosette.  No  information  processing  occurs 
during  a  saccade.  Processing  only  occurs  during 
pauses  of  the  rosette  at  fixation  points. 

The  goal  of  saccade  generation  is  to  ultimately 
move  the  center  of  the  rosette  to  a  repeatable 
position  on  each  object  of  interest  within  the  scene. 
Once  an  object  is  approximately  centered  in  the 
rosette,  it  is  classified  utilizing  the  features 
gathered  in  the  high  acuity  central  region  of  the 
rosette.  At  least  this  is  the  case  for  compact  objects 
(which  will  be  the  focus  of  this  paper).  Extended 
objects  (objects  larger  than  the  two  central  rings  of 
the  rosette  -  see  Figure  4)  can  only  be  classified  by 
linking  information  gathered  at  multiple  fixation 
points  located  on  the  object.  Building  an  eyeball 
vision  system  for  detecting  and  classifying  such 
extended  objects  will  probably  be  more  difficult 
than  for  compact  objects.  Since  almost  all  military 
object  detection  and  classification  problems  can  be 
solved  within  the  confines  of  a  compact  object 
restriction,  we  will  consider  only  compact  objects. 

In  the  presentation  below,  we  begin  with  a 
discussion  of  the  foveal  rosette  and  a  basic  set  of 
features  that  are  derived  from  the  image  at  each  of 
the  rosette  sampling  points.  The  feature  set 
presented  here,  while  sufficient  for  an  initial 
development  effort,  should  probably  be  expanded 
for  an  operational  system  to  include  additional 
important  sacceuie  generation  clues  such  as  color 
gradients  and  frame-to-frame  motion  cues.  The 
issue  of  exactly  how  the  foveal  rosette  features  can 
be  physically  extracted  from  the  scene  is  also 
discussed.  Following  the  discussion  of  the  foveal 
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rosette  and  the  feature  set,  methodologies  for 
object  detection  and  recognition  are  reviewed. 
Finally,  methods  for  developing  a  feature  vector 
library  for  use  in  classification  are  presented. 

2.2.1  The  Foveal  Rosette 

The  foveal  rosette  (see  Figure  4)  is  nothing  but  a 
sampling  frame.  At  each  intersection  of  a  radial 
line  and  a  ring  (or  of  the  radial  lines  themselves  — 
namely,  the  center),  a  set  of  features  is  gathered. 
The  essential  element  of  the  rosette  is  the  density 
of  the  features,  not  their  regular  spacing.  In  fact,; 
randomly  located  sampling  points  could  just  as 
well  be  used,  as  long  as  their  average  density  were 
to  fall  off  properly  with  radial  distance  from  the 
center  of  the  rosette.  Regular  spacing  simply  makes 
the  system  easier  to  describe  and  work  with.  It  also 
facilitates  the  efficient  mathematical  comparison  of 
the  feature  sets  gathered  at  different  fi.xation  points 
on  different  images. 

The  features  that  are  extracted  at  each  point  of 
the  rosette  could  be  almost  anything.  For  example, 
we  might  measure  the  local  spatial  frequency  of  the 
image  at  one  or  more  scales.  Another  possibility 
would  be  to  detect  local  image  flow  or  measure 
.color  gradients.  To  allow  comparison  with  the 
constant  resolution  system  described  above,,  we 
shall  concentrate  solely  on  the  use  of  local  spatial 
frequency  features  (specifically,  Gabor  logons,  as 
shown  in  Figure  2). 

The  specific  feature  set  that  we  will  discuss  in 
this  paper  is  based  upon  the  concepts  of  Rybak  and 
his  colleagues  [14,15,16,17]  Rybak’s  idea  is  that 
the  spatial  frequency  measurements  at  each  sample 
point  are  made  with  both  sine  and  cosine  Gabor 
logon  correlation  kernels  at  eight  different  angles 
equally  spaced  between  0  degrees  (vertical)  and 
157.5  degrees  (the  opposite  azimuths  are  covered 
by  the  symmetry  of  the  kernels)  -  see  Figure  5. 

We  assume  that  the  objects  of  interest  have  a 
spatial  frequency  structure  such  that  the  objects 
can  be  uniquely  and  easily  classified  by  means  of 
spatial  frequency  measurements  at  two  scales  that 
are  a  fixed  percentage  of  the  overall  object  size 
(and  the  same  for  all  object  types).  Further,  we 
assume  that  the  object’s  size  can  differ  no  more 
than  a  factor  ranging  from  1/2  to  2  from  some 
mean..  While  these  assumptions  may  seem  quite 
limiting,  they  really  are  not.  Surprisingly,  as 
Rybak  has  shown  [14],  the  foveal  spatial  frequency 
feature.^  us«*d  h<*re  are  capable  of  being  reversed  to 
reconstruct  an  immediately  recognizable 


Spatial  frequency  features  with  octave  frequency  spacings 
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Eight  orientations  used  at  each  spatial  frequency 


Figure  5:  Spatial  frequency  kernels  with  different 
spatial  frequencies  are  used  at  different  sampling  po¬ 
sitions  in  the  rosette.  The  spatial  frequencies  get 
smaller  (i.e.,  the  kernels  get  physically  larger)  by  a 
factor  of  1/2  on  each  successively  larger  ring.  Sixteen 
orientations  of  each  sine  and  cosine  kernel  are  used 
at  each  sampling  position  (only  eight  orientations 
are  shown  here,  the  others  are  derived  by  means  of 
symmetry). 
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approximate  version  of  the  portions  of  the  original 
image  that  were  sampled.  In  most  cases  the 
reconstruction  is  quite  sufficient  to  readily  visually 
recognize  the  objects  in  the  image.  Rybak  obtained 
this  result  with  only  833  features  per  rosette. 

Experience  from  the.DARPA  Neural  Network 
ATR  project  and  on  other  image  analysis  projects 
suggests  that  most  military  objects  can  be 
classified  by  measuring  spatial  frequency  content  at 
no  more  than  two  spatial  frequencies  that  are  a 
fixed  percentage  of  the  object’s  size.  In  fact,  almost 
all  objects  have  this  property.  With  the  advent  of 
inertial  navigation  systems,  GPS,  laser  ranging, 
etc.,  almost  all  military  imagery  provides  detailed 
information  about  the  approximate  scale  of  objects 
within  a  specific  image.  Thus,  by  means  of  either 
optical  lenses/telescopes  or  digital  image 
processing,  the  sizes  of  objects  of  interest  within  ■’•n 
image  can  be  controlled  to  within  a  factor  of  1/2  ti, 
2  of  a  desired  mean.  This  is  usually  simple  to 
arrange  in  almost  any  application  (e.g.,  a  mi.ssile 
seeker,  a  reconnaissance  system,  an  imaging  radar, 
etc  ).  If  necessary,  the  .ange  of  object  sizes  over 
which  the  system  can  function  can  be  increa.sed 
However,  this  would  add  cost. 

An  important  issue  regarding  eyeball  vision  is  the 
nature  of  the  sensing  and  feature  extraction 
hardware  Clearly,  the  necessary  sensing  and 
feature  extraction  operations  can  be  carried  out 
using  an  ordinary  television-type  camera  and 
digital  image  processing  While  this  will  work,  it 
may  not  be  the  most  cost-effective  solution  in  the 
long  run.  Specialized  sensors  that  directly  extract 
foveal  rosette  sampled  features  from  a  scene,  such 
as  Zeevi’s  CCD  delay  line  scheme  [19],  may 
ultimately  provide  a  more  cost-effective  solution 
In  the  discussion  that  follows,  we  will  not  concern 
ourselves  with  the  specific  details  of  how  the  set  of 
features  is  derived  from  the  scene.  We  shall  simply 
assume  the  existence  of  the  rasette  sampling 
pattern  and  t!ie  associated  spatial  frequency 
features  (although  we  shall  count  the  calculations 
required  to  extract  them). 

The  specific  features  we  will  discuss  are  shown  in 
Figures  2  and  5  (see  [4, 5, 0,7, 8]  for  details)  At  each 
.sample  point,  we  calculate  Gabor  logon  wavelet 
features  of  a  single  spatial  frequency  at  eight 
different  orientation  angles,  using  both  sine  and 
cosine  logons.  The  scale  of  the  spatial  frequency 
features  at  each  ring  is  2  times  the  scale  of  the 
corresponding  features  at  the  ring  just  inside  of  it. 
The  spatial  frequency  of  the  features  used  on  the 


first  ring  are  the  same  as  those  used  at  the  center. 
The  rings  themselves  have  radii  that  increase  by  a 
factor  of  2  between  successive  rings.  This  feature 
set,  with  some  further  tuning  and  refinement,  is 
probably  adequate  for  many  object  detection  and 
classification  problems. 

2.2.2  Object  Detection  and  Saccade  Generation 

In  the  eyeball  vision  concept,  object  detection 
involves  two  processes: 

•  Movement  of  the  rosette  to  positions  where 
objects  are  likely  to  be  found. 

•  Determination  that  an  object  of  interest  lies  at 
or  very  near  the  center  of  the  rosette.. 

Movement  of  the  rosette  to  positions  where 
objects  might  be  located  is  carried  out  via  a  set  of 
neural  networks  and  rules.  These  networks  and 
rules  determine  (based  upon  feature  information 
gathered  at  the  current  rosette  position  and  at 
previous  rosette  positions)  whether  an  object  of 
interest  is  likely  to  be  in  a  particular  direction.  For 
example,  one  rule  that  Rybak  has  explored  is  to 
follow  a  prominent  extended  edge  and  look  for 
areas  of  concentrated  “line  activity”  at  a  specific 
point.  Such  points  of  concentrated  line  activity 
(i  e.,  multiple  strong  line  processes  at  different 
angles  located  at  approximately  the  same  location) 
are  known  as  nexus  points.  In  a  typical  military 
object  detection  and  classification  problem,  objects 
of  interest  have  one  or  more  nexus  points,  whereas 
most  other  objects  in  the  environment  do  not. 
Another  rule  might  be  that,  if  a  particular  edge 
process  is  followed  in  search  of  nexus  points,  one 
might  later  revisit  tiiis  same  edge  process  and 
.search  for  it  in  the  opposite  direction.  In  the 
instance  of  such  a  rule,  the  periphery  of  each 
rosette  would  be  carefully  searched  for  evidence 
suggesting  an  extended  edge  process.  This  would 
then  be  used  in  formulating  future  saccades  to 
examine.  A  saliency  detection  neural  network  can 
al.so  be  used  to  augment  the  rule  set  to  determine 
whether  or  not  a  particular  fixation  point  is  a 
nexus  point  on  an  objec*  of  interest. 

Obviously,  some  kind  of  feature  classification 
process  must  be  used  in  saccade  generation.  One 
option  is  to  have  a  separate  feature  analyzer  for 
each  distinct  set  of  saccade  generation  rules.  For 
example,  the  extended  edge  following  rule  might 
use  a  classifier  that  looks  for  and  locates  extended 
edges  in  the  scene  using  feature  data  from  each 
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rosette.  Another  process  might  involve  looking  for 
edge  intersections  to  determine  the  locations  of 
potential  nexus  points.  In  either  case,  a  set  of  rules 
for  moving  the  rosette  is  required.  Saccade 
generation  is  clearly  the  area  of  eyeball  vision  that 
has  the  greatest  need  for  additional  research. 
Notwithstanding  the  need  for  more  research  on 
saccade  generation,  even  the  current  crude  systems 
work  remarkably  well  (see  [15]  for  an  impressive 
example). 

2.2  3  Object  Recognition 

As  the  foveal  rosette  is  moved  about  the  scene  by 
the  saccade  generation  rules,  a  nexus  point  saliency 
detection  neural  network  (a  mapping  network 
trained  on  points  chosen  by  humans  as  being  good 
nexus  points)  is  used  at  each  step  to  determine  if 
an  object  of  interest  is  present  near  the  center  of 
the  rosette..  The  inputs  to  this  network  are  the 
same  multiresolution  wavelet  features  u.sed  by  the 
saccade  generation  rule  base.  This  nexus  detection 
element  is  used  to  decide  whether  full  classification 
of  the  specific  set  of  rosette  features  is  called  for. 
The  ultimate  goal  is  to  compare  each  nexus  point 
rosette  feature  set  with  a  stored  library  of 
catalogued  features. 

The  comparison  or  matching  operation  needs  to 
be  carried  out  in  such  a  way  that  the  system  is 
insensitive  to  scale  changes  in  the  object  by  as 
much  as  a  factor  of  1/2  to  2  from  the  baseline 
scale,  rotations  of  the  object  within  the  plane  of 
the  image  around  the  center  of  the  rosette,  and 
small  changes  in  the  spatial  frequency  content  of 
the  object.  Methods  for  carrying  out  such 
matching  operations  are  known.  One  method  is 
graph  matching  [2,3,12].  In  terms  of  our  specific 
features,  the  essence  of  graph  matching  is  to  take 
the  unknown  feature  set  and  compare  it  with  each 
of  the  known  feature  sets  at  a  variety  of  scale  and 
rotation  offsets.  For  example,  we  might  take  the 
unknown  feature  vector  and  compare  it  (using  an 
abridged  Euclidean  distance  measurement)  with  a 
collection  of  auxiliary  feature  vectors  derived  from 
a  single  library  feature  vector.  The  auxiliary 
vectors  are  created  by  taking  the  library  vector  and 
rearranging  the  feature  values  to  correspond  to 
rotations  of  the  foveal  '•osetto  by  22.5  degrees 
increments  and  scale  changes  of  the  ro.sette  (by 
factors  or  divisors  of  2)  across  scales  of  1/2  to  2. 
The  Euclidean  distance  measurement  is  abridged  so 
that  components  which  would  correspond  to  rings 
that  do  not  exist  in  the  scaled  rosette  are  ignored 


The  outer  ring  is  also  often  ignored,  because  its 
features  eire  used  primarily  for  saccade  generation. 

Instead  of  using  Euclidean  distance,  another 
approach  would  be  to  use  a  neural  network 
comparison  module  that  has  been  trained  on  a 
large  volume  of  known  image  feature  data.  The 
output  of  the  module  is  the  determination  of 
whether  or  not  the  unknown  feature  vector  and  one 
of  the  rotation/scale  altered  versions  of  the  library 
feature  vector  match  sufficiently  or  not.  The  use  of 
a  neural  network  for  this  function  would  seem 
promising,  since  the  subtleties  of  the  matching 
operation  probably  will  allow  a  method  that 
utilizes  more  of  the  feature  content  to  do  better 
than  simple  Euclidean  distance  comparison. 

One  of  the  challenges  of  the  eyeball  vision 
method  is  to  find  a  way  of  matching  an  enormous 
number  of  library  vectors  with  a  particular 
unknown  feature  vector  in  a  small  amount  of  time. 
Cluster  trees  and  other  hierarchical  indexing  or 
content-addressable  memory  techniques  may  be 
useful  for  this  purpose. 

2.2.4  Feature  Vector  Library 

The  creation  of  a  feature  vector  library  for  a 
particular  set  of  objects  of  interest  might  seem  very 
difficult,  but  it  need  not  be.  All  that  is  needed  is  a 
labeling  of  nexus  points  on  objects  of  interest  in  a 
reasonably  large  set  of  images.  During  the  training 
process,  the  rosette  movement  rules  are  allowed  to 
generate  saccades  and  move  the  rosette  around  the 
images.  Human  observation  of  the  rosette’s 
behavior  can  be  utilized  to  improve  and  expand  the 
rule  base.  Neural  networks  can  also  be  trained  by 
humans  to  make  expeditious  saccade  commands. 
Whenever  the  center  of  the  rosette  touches  a 
labeled  object  of  interest  near  a  nexus  point,  the 
rosette  feature  vectors  are  captured  and  added  to 
the  library  with  a  tag  specifying  the  class  of  the 
object  with  which  the  vector  is  associated  in  the 
image.  Rybak’s  work  suggests  that  most  objects 
will  have  multiple  nexus  points.  All  of  the  feature 
vectors  from  these  points  would  typically  be 
gathered  and  stored. 

2.2.5  Foveal  Object  Detection  and  Recognition 
Architecture 

Figure  6  shows  a  hypothetical  foveal  object 
detection  and  recognition  system  architecture.  This 
system  is  now  described.  In  the  next  section  it  is 
compared  with  the  traditional  system. 
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Figure  6:  A  foveal  rosette  image  object  detection  and  recognition  system  design. 


As  shown  in  Figure  6,  the  same  image  as  used  in 
the  traditional  fixed-resolution  system  is  foveally 
sampled  using  a  rosette  with  three  rings,  16  spokes, 
and  a  center  point  (49  sampling  potnts).  16  sine 
and  16  cosine  Gabor  logon  wavelet  features  are 
extracted  at  each  sample  point.  If  we  assume  that 
digital  processing  is  used,  then  each  wavelet  must 
be  computed  by  multiplying  each  pixel  of  a  wavelet 
template  mask  by  each  corresponding  pixel  value 
beneath  the  mask.  The  calculational  burden 
associated  with  these  operations  is  shown  in  the 
table  of  Figure  7.  The  current  position  of  the  foveal 
window  is  also  emitted  (this  position  is  obtained 
from  the  saccade  generator).  The  output  of  the 
foveal  feature  extraction  module  is  a  set  of  32 
features  at  each  of  the  49  sample  points  for  a  total 
of  1568  features  (one  byte  each). 

Following  feature  extraction,  the  nexus  point 
detector  module  uses  the  foveal  features  to 
determine  if  the  current  fixation  point  is  a  nexus. 
This  operation  is  assumed  to  be  carried  out  by  a 
multilayer  perceptron  neural  network  [11]  with 
1568  inputs,  50  first  hidden  layer  units,  50  second 
hidden  layer  units,  and  two  output  units  (one  each 
for  yes  and  no).  While  the  size  of  this  network  is 
just  a  guess,  experience  with  similar  problems 
(such  as  object  detection  using  regularly  sampled 
spatial  frequency  features)  suggests  that  a  network 
of  this  size  should  work  for  a  typical  image  object 
detection  application.  This  network  requires  83,652 
operations  to  determine  the  nexus  point 
classification  for  a  single  fixation  point  (1569  x  50 


+  2  X  51  X  50  +  51  X  2  =  83,652,  including  bias 
inputs). 

If  the  fixation  point  is  judged  to  be  a  nexus  point 
(a  rare  event),  the  object  recognizer  module  is 
activated.  The  object  recognizer  uses  a  search 
procedure  (such  as  a  tree  search)  to  search  through 
a  large  feature  vector  library.  It  is  assumed  that 
100  comparisons,  each  requiring  3  x  1568  =  4704 
arithmetic  operations,  are  needed  to  complete  the 
search.  This  is  reasonable,  since  trees  can  be 
designed  to  keep  the  search  time  to  a  low  multiple 
of  log  N,  where  N  is  the  number  of  example  ieature 
vectors  stored  in  the  feature  vector  library 
(including  redundant  rotated  and  scaled  versions). 

Following  each  nexus  point  detection  operation, 
the  saccade  generator  module  selects  a  new  fixation 
point  (unless  it  judges  that  the  image  search  has 
been  completed).  The  operation  of  this  module  is 
assumed  to  involve  a  combination  of  both  rules  and 
neural  networks  having  a  combined  total 
computational  burden  four  times  as  great  as  the 
saccade  generation  module,  or  334,608  operations 
per  fixation  point  (this  is  a  guess  based  upon  the 
saccade  generation  methods  of  Giefing  [10],  Rybak 
[15],  and  Schimidhuber  [18]). 

Let  us  assume  (as  we  did  with  the  constant 
resolution  system  considered  in  the  previous 
subsection)  that  there  are  12  objects  in  the  image 
and  assume  that  there  are  2  nexus  points  for  each 
object  (i.e.,  half  of  the  nexus  points  are  judged  by 
the  recognition  module  to  not  be  objects  of  any  of 
the  40  classes  of  interest).  This  is  reasonable. 
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Wavelet  Size 

Arith.  Ops 

Number  of 

Number  of 

Ring 

(in  pixels) 

(per  pixel) 

Wavelets 

Samples  Pts. 

Total  Ops 

Center  Pt. 

400 

2 

16 

1 

12,800 

1 

400 

2 

16 

16 

204,800 

2 

1,800 

2 

16 

16 

819,200 

3 

6,400 

2 

16 

16 

3,276,800 

Total  Ops. 

4,313,600 

Figure  7:  The  calculations  associated  with  derivation  of  the  1568  features  of  a  single  foveal  rosette.. 


because  the  nexus  point  detector  will  not  be  able 
to  do  as  detailed  an  analysis  as  the  recognition 
module.  Let  us  further  assume  that  there  are  a 
total  of  100  fixation  points  explored  in  the  image. 
We  then  get  a  total  computational  burden  of 
roughly  500  million  arithmetic  operations  per 
image  (100  x  4,313,600  -f  100  x  83,652  -I-  12  x  2 
X  470,400  +  100  X  334,608  =  484,475,600)  Note 
that  the  calculational  burden  associated  with 
extraction  of  the  foveal  features  is  about  90%  of 
the  total  required  computations.  This  illustrates 
why  it  would  be  highly  advantageous  if  a  sensor 
that  directly  extracts  these  features  could  be  built. 

3  Computational  Complexity  Comparison 

In  this  section  the  real  time  object  detection  and 
classification  system  described  at  the  beginning  of 
Section  2  is  used  to  compare  the  constant 
resolution  and  foveal  approaches. 

3.1  The  Guidance  and  Control  Scenario 

We  shall  assume  that  the  airborne  object  d'^tection 
and  classification  problem  described  at  the 
beginning  of  Section  2  is  being  us'^d  for  guidance 
and  control  of  weapon  systems  on-board  the 
platform  and/or  of  the  platform  itself.  We  shall 
assume  a  need  to  process  5  frames  of  imagery  per 
second.  To  make  the  comparisons  simple,  we  shall 
imagine  that  all  of  the  data  flows  shown  in  Figure  1 
and  Figure  6  occur  on  a  single  shared  data  bus 
within  the  information  processing  subsystem 

3.2  Processing  and  Data  TVansfer 
Comparisons 

In  the  case  of  the  constant  resolution  system  we 
have  a  total  processing  load  of  approximately  181 
billion  operations  per  second.  The  foveal  system 
will  have  a  total  processing  load  of  2.4  billion 


operations  per  second.  Thus,  the  foveal  system  is 
almost  two  orders  of  magnitude  faster  than  the 
constant  resolution  system,  assuming  that  both 
systems  are  implemented  in  approximately  the 
same  sort  of  hardware  (see  Figure  8). 

In  terms  of  data  transfer,  if  we  ignore  the  image 
input  (which  is  the  same  for  both)  the  rates  for  the 
constant  resolution  and  foveal  system  operating  at 
5  frames  per  second  are  5.6  MBytes  per  second  and 
1.5  MBytes  per  second,  respectively.  Here  again, 
the  foveal  system  is  better. 

4  Operational  Implications 

The  operational  implications  of  the  comparison 
carried  out  in  Section  3  are  now  briefly  discussed. 

4.1  System  Envelope  Parameters 

The  2.5  billion  operations  per  second  processing 
load  of  the  foveal  system  is  within  reach  of  existing 
or  near-term  processors,  as  is  the  associated  1.5 
MByte  per  second  data  bus  information  transfer 
rate.  Thus,  although  it  is  still  in  need  of  validation 
in  terms  of  its  performance,  the  foveal  approach  is 
well  within  the  computational  and  data  transfer 
rate  envelope  that  can  be  reasonably  postulated  for 
near-future  military  systems. 

In  contrast,  the  constant  resolution  system,  with 
its  181  billion  operation  per  second  processing  load 
and  5.6  MByte  per  second  data  bus  information 
transfer  rate,  will  be  more  difficult  to  implement  in 
real  time  hardware  in  the  near  future. 

5  Conclusions 

Clearly,  the  eyeball  vision  concept  impacts  more 
than  just  cost.  It  introduces  the  possibility  of  using 
knowledge  regarding  the  spatial  appearance  and 
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Constant  Resolution  System 

Module  1 

MOPS  per 
Frame 

MOPS  per 
Second 

Primary  Feature  Extractor 

36,000 

180,000 

Object  Detection 

22.5 

112.5 

Secondary  Feature  Extraction 

165 

825 

Object  Classification 

3.1 

15.5 

Total 

36,200 

181,000 

Foveal  Rosette  System 

Module 

MOPS  per 
Frame 

MOPS  per 
Second 

Foveal  Feature  Extractor 

430 

2,150 

Nexus  Point  Detector 

0.8 

4.0 

Saccade  Generator 

33.5 

167.5 

Target  Recognizer 

11.3 

56.5 

Total 

47^ 

2,378 

Figure  8:  The  computational  requirements  of  llie  constant  resolution  and  foveal  rosette  systems. 


characteristic  detailed  internal  structuri'  of  objects 
of  interest. 

Obviously,  at  this  stage  eyeball  vision  is  little 
more  than  a  concept.  However,  it  seems  worthy  of 
further  investigation,  if  for  no  other  rea.son  than 
the  potential  for  computational  cost  savings 
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0  Introduction:  why  neural 
networks  are  interesting  in  target 
recognition  problems 

Modern  strategic  surveillance  or 
autonomous  weapons  systems  have 
performance  requirements  that  imply 
the  use  of  new  and  innovative  data 
processing  techniques.  The  ever 
increasing  number  and  sophistication 
of  modern  threats,  the  availability  of 
large  amount  of  data  coming  from  large 
numbers  of  transportable  and  moving 
sensor  platforms,  the  extremely  strong 
real-time  defense  system  requirements, 
have  resulted  in  increased  demands  on 
data  and  signal  processing  systems, 
often  overwhelming  conventional 
processing  technologies. 

The  existence  of  larger  numbers 
of  threats  in  a  cluttered  environment, 
the  existence  of  many  false  alarms, 
implies  the  use  of  real-time  adaptive 
algorithms.  Classical  approaches  have 
led  to  often  costly,  inflexible,  algorithm 
intensive  data  processing  systems;  they 
can  only  meet  the  performance 
requirements  through  high-cost 
developments  of  co-processors. 


More  precisely,  target  recognition 
imply  very  adaptive  developments,  the 
nature  of  targets  being  different  from 
one  situation  to  another,  the  targets 
themselves  varying  in  time,  for 
example  during  the  life  of  a  weapon 
system.  Various  pattern  recognition, 
from  the  perspective  of  sensor  signal 
classification  processing,  are  necessary, 
for  example  to  detect  and  classify 
specific  target  signatures  buried  in 
noisy,  clutter-rich  signals. 

Neural  networks  techniques, 
because  learning  from  examples  is  a 
crucial  phase  are  well  suited  for 
problems  requiring  an  adaptive 
behaviour;  by  applying  the  same 
architectures  to  learn  various  database, 
on  can  obtain  developments  at 
relatively  low  costs;  moreover,  good 
fault  tolerance  is  obtained  which  is 
particularly  useful  for  signal  processing 
on  clutter  and  noisy  signals.  Finally, 
neural  networks  are  intrinsically 
parallel  algorithms,  which  allows 
execution  on  parallel  neural  networks 
processors,  which  may  provide  the 
answers  to  some  of  today’s  most 
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formidable  defense  system  processing 
requirements. 

1  Input  signals  and  databases 

Essentially  four  types  of  signals 
are  used  for  target  recognition,  radar 
signals,  infra-red  images,  sonar  signals 
and  TV  images. 

In  each  type  of  signal,  several 
subtypes  can  be  described, 
corresponding  in  particular  to  the 
functionalities  of  the  system;  for 
example,  radar  signals  for  panoramic 
surveillance  ar  every  different  from 
radar  signals  used  in  target  detection  in 
weapon  systems;  moreover  sensors 
have  particularities  in  executing  the 
reception  phase,  which  includes 
filtering,  amplification,  and 
demodulation  of  the  signals,  these 
procedures  being  generally  analog. 

For  the  needs  of  neural  networks 
applications,  big  databases  are  necessary 
for  the  learning  phase.  Here  comes  the 
first  real  difficulty,  because  these 
databases  have  to  be  really 
representative  of  the  problem  to  be 
solved.  Two  issues  are  then  possible; 
either  one  uses  data  obtained  from 
simulations,  or  one  uses  real  data 
registered  either  in  past  conflicts,  or  in 
experimentations  made  by  the  army  or 
the  industrial  groups  interested  in  the 
project.  In  both  cases,  some  questions 
are  raised. 

If  one  uses  simulations,  the 
advantages  are  generally  that  one  has  as 
many  data  as  needed,  that  their  cost 
remain  reasonable,  generally  the  cost  of 
the  development  of  the  software 


simulation,  that  it  is  easy  to  make  a 
database  that  is  statistically 
representative  of  the  data  to  be 
processed,  and  also  to  take  into  account 
some  particular  cases  that  appear  as 
rather  exceptional.  This  leads  you  to  a 
software  that  solves  pretty  well  the 
target  identification  problem  for  signals 
coming  from  the  simulator.  The 
question  is  then:  what  about  real  data? 
Are  the  data  generated  by  the  simulator 
close  enough  to  real  data  to  ensure  good 
performance  on  real  data?  The  answer 
to  these  questions  dearly  depends  of  the 
particular  characteristics  of  the  problem; 
one  can  however  say  that  it  is  relatively 
easy  to  make  simulations  with  shapes 
dose  to  real  targets  shapes,  but  that  the 
main  difficulty  remains  in  the 
simulation  of  noises  and  clutter;  the 
experiments  prove  that  resistance  to 
artificial  noises  does  not  necessarily 
imply  resistance  to  real  duttering. 

If  one  uses  real  time  data,  the 
advantage  is  of  course  that  the  database 
used  for  learning  will  have 
characteristics  close  to  the  data  used  in 
real  tests.  The  inconvenient  is  generally 
that,  except  when  for  the  addressed 
problem,  real  databases  have  been 
recorded  for  years,  you  have  to  record 
new  data  to  complete  your  database, 
and  this  may  imply  very  high  costs. 
Moreover,  it  may  be  merely  impossible 
to  obtain  a  database  being 
representative  of  all  exceptional 
patterns  that  may  occur  in  your  data. 
So,  there  is  then  little  chance  that  the 
system  will  be  able  to  handle  these 
exceptional  cases  that  he  never  met 
before. 
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The  best  solution  is  most  of  the 
time  to  use  data  coming  from 
simulations  in  the  development  of 
prototypes,  then,  to  make  real 
applications,  to  start  from  an  existing  or 
reasonable  cost  database,  and  to 
complete  this  real  database  by  data 
coming  from  simulation.  This  solution 
is  often  the  one  offering  the  best 
price/performance  ratio. 

It  must  also  be  enhanced  that  the 
possibility  of  complementary  learning 
phases  remains  open  and  that, 
consequently,  if  it  will  always  be 
possible  to  enhance  the  performance  of 
the  system  facing  some  particular 
situations  that  had  not  been  forecasted 
originally. 

2  What  preprocessing? 

The  second  problem  that  has  to 
be  addressed  is  the  choice  of  the 
preprocessing.  First,  is  preprocessing 
really  needed?  It  is  clear  that  neural 
networks,  in  many  applications 
perform  very  well  on  raw  data.  This  is 
particularly  true  in  image  processing, 
less  in  signal  processing.  However,  if 
one  wants  to  deal  with  raw  data,  one 
may  have  to  make  a  numeric 
representation  of  the  signal  with  very 
high  frequency;  this  implies  very  big 
memory  size  for  the  system,  and  very 
very  long  learning  time.  So,  to  obtain 
equivalent  results  at  reasonable  cost, 
one  needs  some  preprocessing. 

But,  again  this  really  depends  of 
the  signal.  For  example,  no 
preprocessing  is  really  necessary  for 
recognizing  targets  in  TV  images,  while 
preprocessing  seems  unavoidable  in 


most  problems  using  sonar  signals. 

Most  of  the  time,  the  problem  of 
target  recognition  has  been  studied  for 
long  time  using  various  classical 
methods.  Adapted  preprocessing  was 
then  used,  and  the  experiments  prove 
that  the  best  preprocessings  for  classical 
methods  are  also  most  often  the  best 
preprocessing  for  neural  methods.  For 
example,  in  the  case  of  radar  signals, 
usual  numeric  preprocessing  such  as 
pulse  compression,  doppler  filtering, 
normalization,  or  thresholding  with 

constant  rate  of  false  alarms  have 
proven  to  improve  the  performance  of 

neural  recognition. 

Again,  the  choice  of 
preprocessing  in  itself  depends  of  the 
problem;  for  example,  if  you  want  to 
distinguish  between  various 
helicopters,  the  frequency  of  blades  is 
one  of  the  most  discriminating 
patterns,  so  that  you  will  need  a  doppler 
filtering. 

But  neural  networks  have 
proven  to  be  useful  either  in  the  choice 
of  the  preprocessing,  or  in  the 
preprocessing  itself.  Here  come  a  few 
examples: 

The  way  how  neural  networks 
can  be  used  to  choose  a  preprocessing 
has  been  studied  in  [16].  In  this  paper  a 
two-stage  original  architecture  is 
described:  in  the  first  stage,  a  first  neural 
network  with  input  the  raw  signal 
makes  a  pre-classification,  identifying 
the  type  of  input  signal,  and  yielding  a 
good  choice  of  signal  processing 
method;  in  a  second  stage,  this 
preprocessing  technique  is  applied  to 
the  signal  to  feed  a  second  neural 
network  which  performs  the  precise 
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classification.  In  this  example,  the 
"preclasses"  are  classes  such  as  transient 
sounds,  surrounding  noise  around, 
quasi-stationnary  noises. 

An  example  of  preprocessing 
using  neural  networks  concerns  texture 
analysis  in  infra-red  images.  Such  a 
procedure  is  described  in  [6].  For  target 
detection  in  infrared  images,  texture 
analysis  is  a  very  useful  tool  (  while  for 
example,  because  of  the  low  dynamic, 
and  low  signal/noise  ratio,  contour 
detection  is  not  successful).  To  perform 
the  discrimination  between  textures, 
two  sorts  of  preprocessing  are  used: 
mulliresolution  analysis  by  wavelet 
transform  to  provide  interscale  level 
energies,  and  the  grey-level 
distribution.  A  multi  layers  perceptron 
then  performs  the  classification. 

Another  example  of 
preprocessing  is  described  in  [12].  To 
improve  the  performance  in  pulse 
radar  detection,  pulse  compression 
techniques,  which  involve  the 
transmission  of  a  long  duration  wide 
bandwidth  signal,  and  the  compression 
to  a  narrow  pulse,  are  generally 
employed.  A  neural  network  has  been 
trained  to  perform  this  compression, 
with  computational  speed  faster  than 
those  of  the  traditional  approaches. 

3  Extraction  of  features 

In  all  pattern  recognition 
problems,  features  extraction  has  alway.c 
been  a  key  problem.  If  you  are  able  of 
finding  discriminating  characteristics  of 
patterns  in  signal  or  in  image,  then 
making  the  classification  is  generally 


rather  an  easy  task.  Before  the 
introduction  of  neural  networks,  there 
was  essentially  two  ways  of  extracting 
features  for  a  classification  problem, 
linear  algebra  and  experience. 

The  only  available  mathematical 
method  was  linear  regression,  which  is 
still  the  best  method  to  be  used  when 
the  characteristic  features  can  be 
obtained  in  a  linear  way  from  the 
parameters  coming  from  the  sensor;  but 
this  means  the  problem  is  easy. 

In  other  cases,  the  best  help  for 
extracting  features  is  probably  to  use  the 
experience  of  experts  in  the  domain. 
They  generally  are  used  to  look  for 
particular  patterns  in  the  signal,  their 
approach  has  proven  to  be  successful,  so 
why  not  try  to  identify  these  particular 
patterns.  Even  when  you  use  after 
neural  networks,  this  has  proven  to 
save  lot  of  time  for  learning.  Moreover, 
a  good  choice  of  the  features  may  bring 
to  you  some  invariance  properties  that 
are  adequate  to  your  application.  In 
target  recognition,  one  generally  wants 
to  have  some  translation,  rotation  or 
scaling  invariance;  a  convenient  choice 
of  features  may  bring  this  property.  This 
is  done  in  [9],  (  see  §  5.2  below). 

In  some  cases,  various 
preprocessing  and  features  extraction 
have  been  applied  to  a  same  problem; 
performances  can  then  be  compared. 
This  is  the  case  in  [18],  for  automatic 
identification  of  pulse  sonar  noises.  The 
first  approach  is  based  on  a  joint  use  of 
autoregressive  modeling  and  wavelets 
transform  to  obtain  a  reduced  set  of 
parameters  to  feed  the  classifier  neural 
network.  The  second  is  based  on  a  two 


dimension  signal  (tims-srale) 
representation  by  compactly  supported 
wavelets  as  inputs  for  the  netw^ork.. 

If  backpropagation  is  certainly  the 
mos^  popular  algorithm  in  neural 
networks,  a  key  reason  is  its  ability  to 
extract  automatically  features.  In  fact, 
you  can  consider  the  first  layer  of  multi 
layer  perceptron  as  being  a  feature 
extraction  program,  dedicated  to  the 
addressed  problem.  Moreover^  the 
procedure  of  shared  weights  allows  to 
impose  translation  or  even  scaling  ( 
with  convenient  preprocessing) 
invariance  to  these  features. 

In  [5],  an  example  of  an  extraction 
of  visual  features  for  lofar  images  is 
given.  The  identification  of  underwater 
acoustic  noises  is  actually  made 
essentially  by  human  operators,  either 
by  listening  directly  to  the  noise,  or  by 
looking  at  the  spectrogram  of  the  noise 
(lofar).  A  backpropagation  neural 
network  has  been  used  to  extract  visual 
features  from  the  lofar  diagrams. 

:  lost  of  the  time,  the  features 
that  nave  been  automatically  extracted 
have  their  justification  in  the 
performances  of  the  classification  that 

'  follows  them.  But  sometimes,  specific 

signal  features  extracted  by  hidden  units 

I  of  the  network  can  be  given  an 

interpretation.  A  good  example  is  given 

i 

in  [11].  The  problem  addressed  there  is 
to  classify  sonar  returns  from  an 
undersea  metal  cylinder  and  a 
cylindrically  shaped  rock  of  comparable 
size.  It  can  be  shown  that  certain  hidden 
units  correspond  to  an  aspect-angle 
independent  classification,  while  others 
correspond  to  an  aspect-angle 


dependent  strategy,  encoding  in 
particular  specific  spectral  peaks  or 
nulls. 

4  Neural  classification  techniques 

Backpropagation  is  certainly  the 
most  popular  algorithm  for  target 
recognition  problems,  as  it  is  for  most 
classification  problems.  In  the 
examples  we  are  giving  in  §8, 
backpropagation  is  used  in  [4],  [5],  [11], 
[12],  [13],  [15],  [18],  [19],  [20].  The  main 
reason  for  that,  as  was  said  previously, 
is  that  backpropagation  still  works  if 
preprocessing  or  features  extraction  that 
have  been  made  before  the 
classification  are  not  perfect.  So,  it  is  the 
easiest  way  of  making  an  application, 
main  problems  generally  occuring  in 
the  optimization  of  the  learning  time. 

For  example,  in  [4], 
backpropagation  has  been  applied  to  the 
problem  of  the  detection  of  moving 
targets  in  severely  cluttered 
environments  from  medium  pulse 
repetition  frequency  Doppler  radar 
signal.  Performances,  when  compared 
with  conventional  filter  bank  method, 
proved  to  be  much  better  especially  in 
highly  cluttered  environments. 

Another  example  is  given  in  [15] 
for  the  passive  detection  of  target-like 
signals  in  underwater  acoustic  fields. 
The  input  to  the  Neural  Network  is  an 
intensity  modulated  signal  which  a 
measure  of  the  power  of  the  signal  at 
different  frequencies  as  time  varies.  The 
first  stage  of  the  system  is  an 
autoassociative  memory  whose 
function  is  to  eliminate  the  noise.  The 
output  of  this  first  stage  is  input  to  the 
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second  stage  which  is  a  multilayers 
perceptron.  Performances  are  quite 
promising. 

Stochastic  algorithms  tuch  as 
Boltzmann  machine  are  used  more 
exceptionally,  generally  when  the 
characteristics  of  the  application  imply 
that  the  cost  function  that  is  used  for 
the  classification  has  several  local 
minima  one  wants  to  escape  from.  The 
inconvenient  for  these  algorithms  is 
that  they  are  generally  computer  time- 
consuming,  so  that  their  study  is  often 
coupled  with  hardware 
implementation. 

This  is  the  case  in  [2],  where 
Synchronous  Boltzmann  machines  are 
implemented  on  a  Connection 
machine,  for  classification  of  boat 
outlines  extracted  from  infra-red 
images., 

In  some  other  cases,  randomness 
can  be  used  to  escape  from  flat  portions 
of  the  energy  landscape,  as  it  is  the  case 
in  [22],  where  a  stochastic  variant  of 
backpropagation  improves  convergence 
rates  for  a  sonar  target  recognition 
problem. ,.. 

Learning  Vector  Quantization  is 
a  typical  classification  algorithm, 
probably  the  most  efficient  when  used 
properly;  but  it  has  to  use  perfectly 
adapted  features  as  inputs;  in  some 
problems,  best  results  were  obtained  by 
making  a  first  classification  using 
backpropagation,  then  by  applying  a 
Learning  Vector  Quantization  to  the 
intermediate  hidden  units  of  the 
backpropagation.  In  [9],  Learning  Vector 
Quantization  is  applied  to  features  that 
have  been  manually  extracted  to  insure 


translation,  rotation  and  scaling 
invariance  (see  §5.2  below) 

Neocognitron  is  a  very  powerful 
algorithm,  able  to  extract  automatically 
features,  even  with  some  invariance 
properties.  But,  it  is  not  so  popular 
because  the  architecture  of  the  network 
may  be  rather  complicated,  and  the 
results  very  dependent  of  the  chosen 
parameters.  Most  often,  the  architecture 
of  the  network  corresponds  to  a 
decomposition  into  functionalities  .  In 
[10],  the  neocognitron  is  applied  to 
detection,  recognition,  and 
identification  of  targets  in  infra-red 
images.  It  is  proven  that  a  neocognitron 
can  distinguish  between  tanks,  cows 
and  haystacks,  a  difficult  task  when  they 
are  viewed  by  an  infrared  sensor. 

Kohonen  Topological  maps  is  the 
most  commonly  used  algorithm  for 
unsupervised  target  recognition 
problems.  In  fact,  the  target  recognition 
problems  are  not  so  often 
unsupervised,  so  that  Topological  maps 
are  rarely  used.  One  can  however  see  an 
example  of  its  use  in  [14]  (see 
description  §5.4  below) 

Finally,  Widrow's  Adaline  is 
u  ?d  in  some  cases,  even  if  in  most 
cases,  backpropagation  is  preferred  (  see 
[17]  for  example) 

5  The  key  points 

In  most  of  the  target  recognition 
applications,  some  common  difficulties 
arise;  on  can  quote  four: 

-  Multi  resolution  recognition 

-  Invariance  by  translation, 
rotation,  scaling 

-  Movement  detection 
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-  Global  situation  analysis 

5.1  Multi  resolution  recognition 

Targets  may  be  far  or  close,  big  or 
small,  the  accuracy  of  the  signal  may 
change  due  to  noise  or  cluttering,  so 
that  the  scale  to  wh’ch  one  has  to  use 
the  signal  may  vary.  The  most  popular 
tools  for  taking  into  account  these  sorts 
of  problems  is  the  use  of  Gabor 
functions,  or  wavelets  functions. 

In  [7],  a  multiresolution 
segmentation  technique  is  developed 
for  signals  and  images,  combining 
wavelets  and  neural  networks. 
Multiresolution  analysis  allows 
localization  of  different  contours  in 
different  scales.  Thanks  to  this 
localization  which  characterizes  the 
smoothness  of  the  contour  ,  one  can 
hope  to  distinguish  objects  with 
different  resolution. 

A  hierarchical  organization  of 
feature  vectors  constructed  from  Gabor 
convolutions  with  infra-red  .mages  at 
different  orientations  and  resolutions  is 
used  in  [8]  for  tanks  recognition. 

5.2  Invariance  by  translation,  rotation, 
scaling 

Invariance  by  translation, 
rotation,  scaling  is  important  since 
targets  are  moving  objects  to  be 
recognized  whatever  their  position, 
distance  or  orientation  is.  Invariant 
feature  extraction  is  thus  an  important 
factor..  Even  if  shared  weights 
backpropagation  can  bring  a  partial 
answer  to  this,  abstract  features  are 
often  defined. 


In  [9],  invariant  target  recognition 
is  performed.  The  features  are  defined  a 
priori;  for  example  the  total  number  of 
pixels  with  value  1,  the  sum  of  the 
products  of  pixels  which  are  at  the  same 
distance  from  a  designated  origin,  but 
90  degrees  apart,  ..  are  some  of  these 
features.  Kohonen's  Learning  Vector 
Quantization  2  technique  is  then 
applied  to  these  features  and  gives  very 
good  performances  for  identifying 
silhouettes  images  of  targets. 

5.3  Movement  detection 

Targets  are  moving.  Sensors  are 
generally  giving  a  picture,  including 
position  of  various  targets.;  But 
recognition  tasks  are  much  easier  if 
correlation  between  these  positions  is 
done  from  one  picture  to  the  next 
picture.  This  task  of  tracking,  or 
extracting  trajectories  is  always 
important,  and  is  more  difficult  if  the 
frequency  of  picture  is  low,  compared  to 
the  speed  of  the  targets,  as  in  the  case  of 
some  radars,  for  example. 

In  [1],  visual  information  about 
the  motion  of  objects  in  an  image  is 
obtained,  including  the  description  of 
the  trajectories.  A  neural  networks 
implementation  of  the  so-called 
novelty  filter  allows  to  detect  motion  of 
objects  in  a  scene  and  to  record 
corresponding  trajectories. 

5.4  Global  situation  analysis 

Another  difficulty  to  use  all  the 
information  obtained  is  that,  in  many 
cases,  isolated  information  concerning 
one  target  is  not  enough.  A  decision  of 
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attacking  a  target  may  depend  of  the 
existence  of  other  targets  around  it;  the 
spatial  relations  between  several  targets 
often  give  important  indications  about 
their  intentions. 

A  global  analysis  tool  is  still 
something  prospective;  however,  some 
prototypes  are  developed  on  the  subject, 
such  as  in  [14],  where  recognition  and 
reconstruction  of  spatially  related 
grouping  of  various  objects  is 
addressed.  The  recognition  and 
reconstruction  properties  are  invariant 
under  input  patterns  that  are  translated, 
distorted,  incomplete  and  rotated  by  30 
degrees  with  respect  to  the  training 
patterns.  The  algorithm  is  a 
combination  of  Fukushima's 
neocognitron,  and  Kohonen’s  multi¬ 
layered  multi-topological  feature  maps. 

6  Integration 

The  biggest  difficulty  of 
integrating  new  technologies  in  big 
systems  has  always  been  integration. 
This  is  true  as  well  for  weapon  systems 
and  neural  networks.  In  fact,  two  levels 
of  the  difficulty  of  integration  appear: 
the  integration  in  the  information 
processing  part  of  the  system,  and  the 
integration  in  the  whole  system  itself. 
A  third  level  of  difficulty,  is  not 
addressed  here,  but  has  to  be  quo  ed:  as 
neural  networks  programs  are  made  by 
learning  from  examples,  the  software 
engineering  cycle  imposed  by  military 
administrations,  as  well  as  the  usual 
validation  procedures  are  not 
applicable.  New  agreements  have  ti»  be 
found  on  this  subject  between  military 
administrations  and  weapons  systems 
industry. 


6.1  Integration  of  various  neural  and 
non  neural  modules 

Various  functionalities  have  to 
be  performed  in  the  computers  of 
weapons  systems.  Some  of  them,  as 
seen  earlier,  may  be  well  performed  by 
using  neural  networks.  But,  all  this 
would  be  of  no  use  without  integration 
capabilities  of  neural  networks 
developments  between  themselves, 
and  with  other  modules.  Fortunately, 
lessons  from  expert  systems  have  been 
learnt,  and  integration  is  a  high  priority 
for  most  of  the  neural  networks  tools. 

An  example  of  integration  of 
various  neural  algorithms  used  in 
panoramic  surveillance  is  given  in  [13]. 
In  this  application,  several  multi-layers 
perceptron  trained  using 
backpropagation  are  used  for  image 
prediction,  pattern  and  image 
classification,  image  compression.  Also, 
a  model  deriving  from  simulated 
annealing  solves  the  tracking  problem. 

A  combination  of  classical  and 
neural  algorithms,  from  noise 
removing,  to  identification  is  presented 
in  [21].  A  preprocessing  stage  removes 
noise  from  the  imagery  using  data 
fusion,  and  performs  automatic 
detection  to  obtain  a  range  slice  of  the 
object.  The  object  is  then  normalized 
for  scale,  rotation,  and  translation  in 
the  field  of  view.  Oriented  receptive 
fields  are  applied  to  extract  edge 
strengths,  followed  by  a  neural  network 
that  does  boundary  completion.  The 
object  shape  thus  obtained  is  then  the 
input  of  a  neural  network  based 
classification  stage  that  identifies  the 
object. 


5-9 


6.2  Integration  in  systems 

When  one  wants  to  define  a 
neural  networks  module,  as  seen  in  §  1, 
a  strong  constraint  is  the  availability  of 
databases.  This  may  lead  to  choic'es  that 
are  not  always  compatible  with  the 
functionalities  of  the  whole  system,  as 
mosL  of  the  time  the  available  data  has 
not  been  recorded  especially  fcr  the 
neural  networks  module. 

Two  good  examples  of  a  good 
integration  of  neural  modules  within 
the  fimctionalities  of  the  whole  weapon 
system  are  given:  in  [19],  a 
backpropagation  module  is  used  to 
insure  the  load  limitation  of  a  radar 
plot  extractor  system.  The  network 
differentiates  between  true  and  false 
plots  before  the  tracking  function  is 
performed.  This  allows  to  reserve  the 
tracking  function,  which  is  computer 
time  consuming  to  the  true  plots. 

In  [20],  a  target  recognition  system 
based  on  neural  networks  is  described, 
as  well  as  the  integration  in  the  system. 
In  this  system,  target  recognition  is 
performed  on  infra-red  images  in  two 
steps.In  the  first  step,  potential  targets 
are  classified  in  targets  or  false  alarms, 
to  reduce  again  the  computer 
consuming;  in  the  second  step, 
classification  of  targets  as  planes, 
helicopters  or  missiles,  allows  to  adapt 
the  tracking  algorithms,  to  give 
priorities  to  the  various  targets,  and  to 
give  a  better  evaluation  of  threat. 

Finally,  concerning  integration, 
the  real-time  constraints  justify  the  use 
of  parallel  dedicated  neural  hardwares. 
Up  to  now,  the  technology  of  realizing 
neural  hardwares,  has  been  more 


developed  in  research  laboratories  than 
in  operational  integration  teams,  so 
that, 

7  Conclusion 

Neural  networks  are  certainly  a 
very  promising  technique  for  target 
recognition,  because  of  their 
adaptability,  their  fault  tolerance  and 
their  real-time  potential  due  to  their 
parallelism.  If,  as  in  most  of  the 
applications  of  neural  networks, 
database  availability,  choice  of 
preprocessing,  and  features  extraction 
are  important  to  keep  the  amount  of 
time  necessary  for  learning  within 
reasonable  limits,  the  key  factors  for  the 
success  of  the  applications  are  multi 
resolution  recognition  capabilities, 
invariance  of  recognition  by 
translation,  rotation,  scaling  , 
movement  detection  capability..  The 
integration  of  neural  modules  in 
weapon  systems  requires  new 
validation  processes,  as  well  as  a  careful 
study  to  make  the  neural  modules 
compatible  with  the  sequence  of 
functionalities  of  the  system. 

Backpropagation  is  certainly  the 
most  often  used  neural  algorithm, 
because  of  its  ability  of  extracting 
features.  Various  comparisons  of 
performance  with  classical  methods 
have  been  made  on  some  examples. 
One  is  given  in  [17],  where  neural 
networks  outperform  classical 
algorithms  for  some  problems  of 
classification  of  natural  underwater 
sounds.  But  there  is  no  general  rule, 
and  in  fact,  most  of  the  time 
performances  mainly  depend  on  the 
representativity  of  the  database. 
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ABSTRACT 


Vision  systems  are  finding  wide-spread  use  in  such  areas  as  autonomous  robotics  and  in  more  mundane 
situations  for  the  interpretation  and/or  identification  of  objects  in  images  generated  by  various  sensors. 

This  tutorial  pressents  an  overview  of  the  various  areas  in  which  such  systems  have  proven  successful 
and  an  inuoduction  to  the  underlying  theory. 

The  human  vision  system  seems  to  be  composed  of  a  set  of  pre-attentive  filters  located  in  the  retina 
which  do  an  tmmediate  data  reduction  by  compuung  a  set  of  features  (a  feature  vector).  These  features 
are  transmitted  to  the  brain  for  interpretation  as  images.  Synthetic  vision  systems  are  based  on  the  same 
functional  decomposition  of  feature  extraction  followed  by  interpretation. 

The  use  of  pre-attentive  filters  for  svnthetic  vision  systems  has  gained  wide  acceptance  and  produced 
some  impressive  results.  The  concept  of  pre-attentive  filters  is  inuoduced  and  the  Gabor  and  the 
Fourier-Melhn  filter  are  shown  as  typical  examples. 

Several  types  of  neural  nets,  given  the  propriate  input  data,  can  be  trained  as  interpreters  to  classify, 
complete  and  identify  patterns.  Several  architectures  are  explored  for  these  applications. 

The  first  class  of  applications  exploits  the  mapping  charactenstics  of  neural  networks.  This  ability  leads 
to  a  set  of  applications  m  pattern  classification,  pattern  completion  and  pattern  recognition.  The  second 
is  in  the  more  difficult  field  of  object  (target)  recognition  Experimental  results  in  image  compression 
and  target  identification  arc  drawn  from  the  literature. 

It  is  suggested  that  the  techniques  for  creating  vision  systems  appear  to  be  applicable  to  very  large  class 
of  problems  not  normally  associated  with  'seeing'  as  we  normally  consider  it. 

INTRODUCTION 

The  goal  of  replicating  the  capabilities  of  the  human  vision  system,  or  perhaps  more  ambitiously 
the  vision  systems  of  various  other  animals  with  superior  capabilities,  is  undergoing  some  form  of 
realization  at  this  time.  Electronic  vision  systems  with  some  of  the  capabilities  of  animals  are 
being  routinely  accomplished. 

While  a  complete  electronic  vision  system  that  simulates  the  capabilities  of  animals  may  seem  a 
desirable  goal,  in  most  cases  some  specific  subfunction  is  all  that  is  required.  Robots,  for  example, 
need  only  'see'  what  is  required  to  perform  their  function.  This  may  only  demand  the 
identification  of  a  hole  in  a  casting  into  which  some  part  is  to  be  inserted.  In  other  cases,  only 
predetermined  shapes  or  objects  need  by  identified.  Thus,  in  most  cases,  while  researchers  may 
seek  biologically  emulated  electronic  systems,  a  vastly  lower  order  of  functionality  is  usually  what 
emerges  in  practice. 

Animal  vision  systems  are  composed  of  two  main  functional  partitions.  The  first,  in  the  eye, 
consists  of  a  vast  array  of  pre-attentive  filters  located  in  the  retina  which  are  either  genetically 
coded  or  trained,  early  in  life,  to  recognize  certain  attributes  of  the  light  energy  they  receive.  The 
output  from  the  filters  forms  a  feature  vector  (a  coded  representation  of  the  image),  which  is 
transmitted  to  the  brain.  The  brain  interprets  this  code  and  creates  an  image.  The  visual  richness 
of  the  resulting  image  depends  on  the  evolutionary  demands  that  have  been  placed  on  the  species. 
A  frog,  for  example,  seems  to  see  only  motion  qualified  by  some  indication  of  mass.  The 
interpretation  of  these  images  is  very  simple;  small  things  you  try  to  eat,  and  large  things  you  try 
to  escape.  The  human  system,  we  assume,  has  responded  with  the  most  complex  and  valid 
representation  of  the  external  world  both  through  our  coding  mechanisms  and  our  interpretative 
capabilities.  On  the  other  hand,  it  is  very  conceivable  that  we  are  missing  many  subtleties  in  the 
surrounding  world. 

Research  in  vision  systems  seems  to  have  been  concentrated  in  three  general  areas:  understanding 
and  proposing  models  of  the  animal  system;  modelling  the  generation  of  feature  vectors,  and, 
training  neural  networks  to  recognize  certain  attributes  of  an  image.  It  is  the  latter  two  we  are 
interested  in.  The  modelling  approach  attempts  to  create  feature  vectors  which  represent  the 
image  with  such  fidelity  that  it  can  be  reproduced  (this  is  mo.st  useful  in  transmission  and  storage), 


or  which  enhances  certain  attributes  useful  for  classification  or  for  object  recognition.  This  later 
capability  is  perhaps  of  most  interest  to  those  concerned  with  guidance  and  control. 

This  paper  is  organized  in  four  main  parts:  In  the  first  we  will  review  a  model  of  animal  vision 
and  from  this  propose  a  model  for  electronic  vision  systems.  In  the  second,  we  will  review  neural 
computation  from  the  point  of  view  of  image  processing.  In  the  third,  we  address  the  use  of 
neural  networks  to  classify  or  identify  objects  presented  to  the  input.  Finally,  in  the  fourth,  we 
will  address  the  use  of  pre-attentive  filters  used  to  more  closely  simulate  animal  vision. 


VISION  SYSTEMS 


Animal  Vision 

A  model  of  an  animal  vision  system  is  shown  in  Figure  1.  In  this  model,  the  image  is  decomposed 
by  a  large  array  of  sensors  which  become  trained  to  recognize  attributes  of  the  environment  (such 
as  vertical  strips  or  bars)  based  on  the  characteristics  of  the  received  light.  These  sensors  have 
been  shown  to  have  a  response  which  is  similar  to  a  two  dimensional  sinusoid,  damped  with  a  (two 
dimensional)  Gaussian  decay  function.  This  function  originally  proposed  by  Gabor  has  the  unique 
property  of  a  minimal  space-time  dimensionality  under  a  Fourier  transform.  The  functions  are 
called  Gabor-Logons,  after  Gabor  [B-1]  who  studied  these  functions  in  communications  theory. 

A  feature  vector  is  generated  based  on  the  output  of  these  preattentive  filters  and  conveyed  to  the 
brain  along  the  optic  channel  The  brain  interprets  the  signals  and  creates  an  image.  The 
interpretation  process  is  partially  genetic,  and  is  dependent  on  training.  Daugman  [C-1]  and  many 
others  have  shown  the  validity  of  this  model  by  actual  measurements  on  the  eye  of  various  animals. 
While  the  process  seems  almost  unbelievable  in  its  complexity,  upon  reflection  it  seems  an 
eminently  sensible  way  of  reducing  the  image  data  to  an  essential  subset  which  can  be  processed  in 
some  reasonable  time. 

Machine  Vision  -  A  General  Architecture 

Systems  for  emulating  animal  vision  system  have  a  similar  architecture,  as  shown  in  Figure  1.  The 
sensors  could  be  physical  elements  producing  a  characteristic  of  the  image,  or  simulated  elements 
whose  outputs  are  derived  by  a  computation  on  the  input  image.  Sensors  outputs  are  fed  to  a 
processing  element  which  act  directly  to  produce  results  (such  as  classification)  or  to  an  interpreter 
for  subsequent  processing.  In  the  case  of  simulated  filters,  images  are  usually  captured  by  some 
form  of  scanner  which  produces  a  pixel  stream  representing  light  intensity  and/or  color.  The 
sequence  of  pixels  becomes  the  synthetic  image  presented  to  the  computational  procedure.  The 
characteristics  of  tht  sensors  and  their  number  depends  on  the  application. 


A  PARTICULAR  VIEW  OF  NEURAL  COMPUTATION 


All  The  World  is  a  Vector 

Neural  computing,  in  all  its  paradigms,  assumes  some  form  of  vector  input  and  produces  a  vector 
output.  The  interpretation  of  the  vectors  and  the  processes  of  responding  to  the  input  vector  vary 
widely,  however  the  basic  view  remains  unaltered. 

In  order  to  provide  an  image  input  to  a  neural  network,  it  is  necessary  to  reduce  the  image  to  a 
vector.  This  is  usually  done  by  some  form  of  raster  scan  in  which  the  pixels  become  the  vector 
components.  The  generation  of  pixels  depends  on  the  sensor  and  on  the  problem.  For  example  in 
scanning  a  satellite  image  of  clouds,  a  one  kilometer  square  is  averaged  to  produce  a  pixel. 

The  interpretation  of  the  output  vector  depends  on  the  problem.  In  image  classification,  for 
example,  the  output  would  represent  the  estimate  of  which  class  the  image  is  from.  In  target 
recognition,  the  output  would  be  an  estimate  of  which  the  class  of  targets  the  object  is  from. 

The  initial  task  of  the  system  designer  is  to  decide  on  the  format,  size  and  interpretation  of  the 
input  and  output  vectors,  and  then  on  the  appropriate  neural  paradigm  needed  to  generate  the 
transformation. 
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The  neuron  in  most  paradigms  computes  a  distance  function  between  its  internal  weights  and  the 
incoming  vector.  This  is  usually  an-inner  product  or  a  vector-difference  of  lengths.  The  resulting 
number  represents  how  close  the  input  is  to  the  neuron's  weights.  The  output  of  the  middle  layer 
of  a  feedforward  network  is  a  set  of  numbers  representing  the  closeness  of  the  input  vector  with 
each  of  the  neuron  weights.  This  vector  must  be  processed  to  produce  the  desired  output. 

When  considering  image  systems  it  is  necessary  to  consider  the  effects  of  the  volumes  occupied  by 
the  class  of  images  and  by  the  desired  responses  of  the  system. 

Hyperspace  and  Hypervolumes 

The  world  of  images  can  be  considered  to  occupy  an  n-dimensional  space  where  each  pixel  is 
interpreted  as  a  basis  vector  in  the  space.  In  a  real  sense  then,  an  image  is  a  vector  and  a  class  of 
similar  images  could  be  considered  to  occupy  a  volume  in  image  space.  For  convenience, 
multidimensional  spaces  are  referred  to  as  hypervolumes  to  indicate  their  n-dimensional  character. 
This  distinction  is  important  to  remember  since  the  intuitive  extension  of  our  concept  of  volumes 
does  not  prove  valid  in  n-dimensional  space.  The  sphere  is  the  only  volume  that  preserves  its 
intuitive  shape  and  metrics  (volume,  radius,  circumference,  etc.).  The  cube  for  example  becomes 
a  multipointed  star. 

Image  classes  typically  occupy  very  convoluted  volumes  in  image  space,  which  demands  a  complex 
partitioning  mechanism  in  order  to  separate  and  identify  a  particular  class  of  images.  Multilayer 
neural  networks  would  seem  an  ideal  mechanism  to  accomplish  this,  since,  in  theory,  a  multilayer 
network  can  create  arbitrarily  complex  partitions.  In  practice,  there  are  a  multiple  of  practical 
difficulties.  The  most  significant  being  that  successful  training  demands  a  representati’’.;  set  of 
training  examples  which  will  expose  to  the  neural  network  the  complexity  of  the  image  volume, 
and  define  strictly  the  boundaries  between  distinct  volumes.  Since  the  shape  of  the  image  space  is 
impossible  to  define,  the  selection  of  representative  images  also  becomes  very  difficult  to 
guarantee. 

In  addition,  very  large  image  spaces  (say  128x128  pixels  or  higher)  demand  relatively  large  neural 
nets  and  there  is  no  theoretical  way  of  predicting  the  exact  size  or  topology  (number  of  layers  and 
number  of  neurons  per  layer).  Despite  the  theoretical  capabilities  of  multilayer  neural  networks, 
the  reality  is  that  training  by  backpropagation  (of  an  error)  through  many  layers  become 
ineffective,  since  the  error,  as  it  propagates  backward,  becomes  less  and  less  meaningful.  Thus 
multilayer  networks  become  extremely  difficult  to  train. 

The  results  of  these  and  other  factors  usually  means  that  the  space  is  partitioned  in  such  a  manner 
that  the  exact  partition  between  classes  is  only  approximate  and  some  intrinsic  error  always 
remains.  In  most  cases,  there  is  a  need  to  reduce  the  dimensionality  of  the  image  space  by 
extracting  a  feature  vector  which  preserves  the  essential  features  of  the  image  needed  for  the 
particular  application. 

We  will  look  at  two  applications  to  illustrate  these  concerns;  classification  and  target  recognition. 


CLASSIFICATION  AND  RECOGNITION 


Introduction 

The  classification  and  the  recognition  problems  have  similar  attributes,  however  the  problems  are 
essentially  different.  For  our  purposes  we  will  assume  that  the  classification  problem  will  refer  to 
a  situation  in  which  a  number  of  classes  of  images  exist  in  which  members  of  the  same  class  share 
some  similar  attributes.  The  problem  becomes  to  view  a  new  image  and  assign  it  to  one  of  the 
classes.  Recognition  usually  refers  to  the  (more  difficult)  problem  of  viewing  an  object  within  a 
scene  and  assigning  it  to  a  class  of  objects.  In  the  first  case,  the  image  is  usually  homogeneous, 
while  in  the  second  the  object  can  be  arbitrarily  located  in  some  form  of  background  clutter. 

In  the  case  of  object  recognition,  the  object  must  be  found  before  it  can  be  recognized.  The 
recognition  algorithm  must  be  insensitive  to  translation  (both  horizontal  and  vertical).  It  must  in 
general  also  be  insensitive  to  rotation  and  scale  changes  of  the  object.  These  constraints  impose 
very  difficult  requirements  on  the  recognition  algorithm.  In  the  case  of  classification,  the  scene 
may  also  be  rotated,  and  translation  may  imposed  because  of  the  starting  point  of  the  picture.  In 
general  recognition  is  a  more  difficult  problem  than  classification. 
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Image  (Vector)  Classification 

Image  classification  is  the  task  of  placing  an  unknown  image  into  a  class  of  predefined  image 
classes.  The  classification  of  clouds  from  satellite  images,  sea  state  from  radar  returns,  or  ground 
cover  are  typical  examples. 

The  classification  of  cloud  images  using  neural  computing  paradigms,  for  example,  seems  an  ideal 
application.  The  classification  requires  trained  meteorologists,  the  exact  class  differences  while 
recognizable  are  difficult  to  quantify,  and  many  data  sets  are  available  for  training  and  testing  the 
neural  network.  In  addition  there  arc  many  examples  of  classified  images  available,  and  even 
results  from  other  approaches  to  the  classification  problem,  to  provide  comparisons  of 
performance.  Based  on  these  problem  attributes,  it  would  be  assumed  that  the  problem  was  an 
ideal  candidate  for  neural  technology. 

The  major  difficulty  with  the  application  of  neural  networks  to  the  classification  of  images  is  the 
massive  data  sets  required  to  describe  an  image,  (say  100x100  pixels  or  larger).  This  introduces 
major  problems: 

1  The  computational  load  is  immense,  and  adequate  artificial  neural  network  simulators 
running  on  reasonable  computers  become  very  slow  in  training  and  operation. 

2.  The  error  surface  during  training  becomes  many-dimensional  and  very  convoluted,  so 
that  training  may  or  may  not  converge  in  finite  time. 

,3.  The  shape  of  the  image  volumes  become  impossible  to  predict  and  as  a  result  the  design 
of  the  neural  topology  to  achieve  an  acceptable  partition  is  subject  to  a  error  approach. 

4.  The  selection  of  training  examples  is  difficult,  since  the  training  set  must  be  both 
representative  (of  the  density  in  a  complex  image  volume)  and  be  chosen  to  achieve 
rotational  and  translational  invariance  of  the  images. 

The  first  two  problems  have  been  traditionally  attacked  by  preprocessing  the  raw  image  data  to 
obtain  a  lower  dimensional  feature  vector,  which  adequately  represents  the  original  image.  The 
goal  is  to  generate  a  feature  vector  which  has  lower  dimensionality  than  the  image  data  and  which 
retains  all  the  essential  attributes  of  the  image  for  subsequent  processing.  The  latter  two  problems 
offer  a  most  difficult  challenge  in  characterizing  the  training  and  test  sets. 

Object  (Target)  Recognition 

Object  recognition  is  a  term  used  to  describe  the  task  of  picking  an  object  or  class  of  objects  from 
an  image.  In  this  application,  it  is  expected  that  the  interpretation  mechanisms  will  be  presented 
with  a  feature  vector  and  the  output  will  be  a  decision  on  the  existence  and  possibly  the  location  of 
a  member  of  a  class  of  objects.  In  many  cases,  depending  on  the  complexity  of  the  system, 
estimates  of  the  existence  of  an  object  can  be  made  even  when  they  are  ob.scured  by  screens. 

The  major  difficulty  is  that  the  background  is  essentially  clutter  from  which  the  objects  must  be 
located  and  identified.  This  usually  involves  finding  masses  of  distinguishing  features  and  then 
creating  a  negative  in  black  and  white,  followed  by  a  search  for  the  shape  of  each  mass.  The 
masses  are  isolated  and  features  of  each  created  for  subsequent  identification 

Image  (Vector)  Completion 

Image  completion  is  a  simple  extension  of  the  recognition  problem.  In  this  application,  the  image 
is  first  classified  and  the  classification  is  used  to  drive  the  display  of  a  prototype  of  the  class.  A 
typical  application  is  to  display  the  complete  image  of  a  partially  concealed  object,  such  as  a  gun  or 
a  vehicle.  The  problem  here  is  to  define  the  class  boundaries  in  such  way  that  an  incomplete 
vector  will  terminate  in  the  hypervolume  assigned  to  the  class  of  objects. 
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PRE-ATTENTIVE  FILTERS  AND  VISION  SYSTEMS 


Introduction 

In  classical  statistical  analysis  of  large  images,  it  is  common  to  derive  a  feature  vector  which  is 
assumed  to  describe  the  essential  attributes  of  the  image.  It  is  assumed  that  images  in  a  class  will 
have  features  that  are  grouped  in  a  lower  dimensional  hypervolume  than  the  original  image.  The 
feature  vector  is  then  subjected  to  statistical  analysis  to  separate  classes,  usually  by  some  form  of 
linear  discriminate  measure.  A  well  chosen  feature  vector  will  maintain  a  one-to-one  mapping 
between  the  image  classes  and  the  feature  vector  space,  and  substantially  reduce  the  computational 
requirements  for  subsequent  classification.  Garand  [C-1],  for  example,  has  proposed  a  set  of 
thirteen  features  which  are  used  to  classify  cloud  images  with  an  accuracy  in  the  high  80%  range. 

In  machine  vision  system  the  image  is  captured  by  a  scanning  technique  and  represented  by  pixels 
(grey  scale  or  color),  which  in  turn  are  used  as  inputs  to  computational  elements  which  compute  an 
element  of  the  feature  vector.  The  elements  are  called  pre-attentive  filters  or  sometimes  lenses. 

Pre-Attentive  Filters  -  The  Concept 

Any  image  can  be  considered  as  a  projection  onto  a  set  of  bases  vectors  {L(x,y)),  where  {x,y} 
represent  the  Cartesian  location  of  the  image  pixel  in  image  space.  The  resulting  image  becomes: 

r(x.y)  =  Z(aiLi(x,y)) 

If  the  {Li(x,y)}  is  a  complete  orthogonal  set  (such  as  a  Fourier  series),  then  the  set  (ai)  can  be 
computed  by  a  standard  inner-product  computation,  and  the  representation  has  a  set  of  well  known 
characteristics.  Orthogonal  representations  have  been  widely  studied,  perhaps  because  the 
calculation  of  the  coefficients  is  tractable,  and  orthogonality  is  comprehensible  by  humans.  The 
obvious  question  becomes  'How  closely  does  the  representation  r(x,y)  correspond  to  the  original 
image?'  The  answer  must  be  qualified  by  several  considerations,  for  example,  "Is  the  goal  is  to 
create  a  set  of  coefficients  that  pre.serve  the  image  in  detail  to  the  extent  that  it  can  be  reproduced, 
or  is  the  goal  to  extract  a  set  of  features  particular  to  some  application?" 

{Li(x,y)}  can  be  considered  as  a  generalized  set  of  filters  whose  individual  characteristics  will 
determine  their  applicability  to  a  particular  problem  domain.  Filters  used  for  various  image 
processing  applications  become  subclasses  of  the  generalized  filter,  each  having  a  set  of 
characteristics  and  parameters  which  distinguishes  them,  and  defines  their  suitability  for  a 
particular  application.  Within  an  application,  the  number  of  filters  required  to  achieve  the 
necessary  performance  becomes  the  issue,  since  this  will  determine  the  computational  complexity 
required  to  generate  the  features.  Having  generated  the  feature  vector,  the  question  becomes 
'What  processing  is  required  to  exhibit  the  required  results?'  Finally,  the  location  of  the  pixels  in 
an  image  need  not  necessarily  be  in  Cartesian  coordinates.  A  polar  representation  is  used  in  some 
cases.  The  selection  of  an  appropriate  set  of  L  functions  becomes  the  major  issue  in  most  vision 
systems. 

Feature  vectors  based  on  the  apparent  preprocessing  performed  by  the  human  eye  have  been 
studied.  These  are  called  Gabor  lenses,  and  image  compression  (and  reproduction)  with  less  than 
one  bit  per  pixel  has  been  reported.  Gabor  lenses  are  also  insensitive  to  translation  of  the  image. 
Fourier-Mellin  lenses  have  also  been  demonstrated,  which  are  insensitive  to  rotation.  These  lenses 
retain  the  essence  of  an  image  with  a  reasonably  small  feature  vector. 

The  computation  performed  by  these  lenses  correspond  to  that  of  a  linear  neuron.  They  compute 
an  inner  product  of  the  neuron  weights,  and  the  input  image.  The  set  of  such  products  is  the 
feature  vector.  Each  lens,  however,  requires  an  iterative  experimental  procedure  to  determine  the 
individual  lens  parameters  (the  weights),  and  the  number  of  lenses  to  achieve  the  desired 
compression  or  fidelity.  In  the  experimental  sciences,  the  lens  parameters  are  adjusted  to  fit  the 
experimental  observations  (say  of  the  image  preprocessing  in  a  cat's  eye).  In  the  image 
classification  problem,  no  such  data  exists  and  the  parameters  and  number  of  lenses  must  be  chosen 
by  an  iterative  set  of  experiments,  which  hopefully  converge  to  the  desired  performance. 

The  selection  of  statistical  feature  vectors  tends,  on  the  other  hand,  to  be  based  on  image  attributes 
recognizable  (and  computable)  by  humans.  The  selection  of  a  set  of  lenses  to  create  a  feature 
vector,  depends  on  the  requirements  of  the  problem,  i.e.,  image  compression  and  reproduction. 
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image  classification,  etc.  No  well  defined  theory  exists  to  guide  the  choice  of  features  in  any  of 
these  approaches.  Experimental  results  are  the  final  validation. 

The  weights  of  the  first  layer  of  a  feed-forward  neural  network  contains  the  same  number  of 
weights  as  the  input  space.  In  some  sense  these  weights  could  be  viewed  as  a  synthetic  image.  In  a 
trained  neural  net,  each  middle  layer  neuron  represents  a  region  in  the  image  hypervolume.  The 
selection  of  the  number  of  neurons  and  their  weights,  by  what  ever  training  mechanism,  must 
provide  a  minimal  number  of  appropriately  weighted  neurons  to  yield  the  required  classification 
accuracy.  Other  than  trial  and  error,  no  procedure  exists  in  classical  neural  training  approaches  to 
guarantee  these  results. 

It  is  proposed  here  that  the  major  difficulty  is  the  lack  of  knowledge  of  the  complexity  of  the 
hypervolumes  occupied  by  each  image  class.  Indeed,  given  that  this  assumption  is  valid,  the 
problem  is  even  more  difficult  because  a  description  of  the  hypervolume  is  impossible  to  obtain. 
Any  approach  to  representing  the  distribution  of  images  in  this  hypervolume  must  be  based  on  this 
assumption. 

Figure  2  illustrates  the  situation  in  two  dimensions.  The  image  volumes  are  convoluted  and 
potentially  interlaced  as  shown.  The  selection  of  an  unrepresentative  training  set  could  create  a 
partitioning  hyperplane  as  shown  Remembering  that  a  neuron  computes  a  distance  measure  from 
its  internal  weights,  in  this  case  it  is  clear  that  a  single  exemplar  at  the  centroid  will  cause 
overlapping  with  the  neighboring  class  as  shown  in  Figure  3.  A  multilayer  network,  while 
potentially  capable  of  drawing  complex  boundaries  between  such  classes,  must  still  be  given  the 
correct  number  of  neurons  and  the  number  of  hidden  layers  and  an  appropriate  training  set  to 
achieve  the  correct  partition.  Clearly  an  image  lens  based  on  the  centroid  of  the  classes  is 
completely  inappropriate  This  problem  seemed  to  define  the  upper  limits  of  classification 
accuracy  (regardless  of  the  length  of  training).  The  final  apparently  insurmountable  problem 
seems  to  be  that  that  the  shape  of  the  image  volumes  in  image  space  cannot  be  determined. 

Pre-Attentive  Filters  -  The  Theory 

In  general,  some  set  of  two-dimensional  functions  Li(x,y)  defined  on  the  same  set  of  pixels  as  the 
image  can  be  defined  (in  the  familiar  case,  for  example,  the  exponential  functions  of  the  Fourier 
series),  such  that  a  feature  vector  representing  some  estimate  of  the  image  is  generated  by  the 
series' 


F[x.y]  =  Z(a,  L,[x.y] 

where  the  set  {aj  }  represent  the  projections  on  {Lj} 

The  series  expansion  is  an  attempt  to  build  up  the  original  function  by  the  superposition  of  a  set  of 
simpler  functions,  which  have  some  predefined  set  of  de.sirable  attributes  (such  as  orthogonality). 

The  resultant  F[x,y]  is  either  identical  to  I[x,y]  or  is  different  is  some  way.  F  is  now  processed  to 
regain  I  or  to  derive  some  attribute  of  I.  Clearly  if  Li[x,y]  is  a  complete  orthogonal  set,  then 
F[x,y]  is  an  exact  representation  of  I[x,y],  and  the  set  (aj  }  can  be  computed  as  the  (normalized) 
inner  product. 


a,  =  Z(Li[x,y]I[x,y])/ZLi[x,y] 

The  inner  products  and  the  projections  of  a  vector  on  a  nonorthogonal  set  of  axis  are  not  the  same, 
and  they  must  be  determined  according  to  an  optimization  criterion.  What  ever  the  criterion  it 
should  be  tractable,  and  meaningful  in  practice.  Consider  for  example  minimizing  the  squared- 
norm  of  the  difference  in  the  lengths  of  the  image  and  the  feature  vector,  i.e.,: 

E  =  III[x,y]  -  F(x.y)l|2 

The  difference  can  be  computed  by  direct  substitution  at  the  pixel  level: 

I(I[x.y]  -  F[x.y])2 

Substituting  the  series  expression  for  F(x,y)  and  differentiating  with  respect  to  aj  yields:. 

5E/8ai  =  -2l(l[x,y]Li[x,y])  f  l2(SakLk[x,y])Li[x,y])  =  0 
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Satisfying  this  condit'on  yields  a  set  of  simultaneous  equations: 

IZ(I[x,y]Li[x,y})  =  L(akLk[x,y]Li[x,y])l 

The  left-hand  side  represents  an  inner  product  calculation  of  the  projection  of  each  lens  on  all  the 
other  lenses.  This  suggests  that  to  minimize  the  mean  square  difference  vector,  we  must  find  a  set 
of  coefficients  {ai}  such  that  the  inner  product  of  each  vector  Li  with  the  entire  set  combination  of 

2(akLk[x,y])  is  the  same  as  the  inner  product  with  the  original  image.  We  note  that  this  is 
obviously  true  if: 

F[x,y]  =  I[x,y] 

By  substituting  for  the  inner  summation,  the  result  can  be  written  as  : 

Z(l[x,y]Li[x,y])  =  Si  =  EF[x,y]Li[x,y] 

We  note  also  that  the  left  hand  side  is  the  inner  product  of  the  image  and  the  basis  vectors.  The  set 
of  equations  could  therefore  be  lepresented  as  a  matrix  equation: 

S  =  LijA 

Where  S  is  a  vector  of  length  n,  Lij  is  a  nxm  matrix  (where  the  terms  are  computed  as  the  inner 
product  LiLj)  and  A  is  a  vector  of  length  n.  For  example,  for  the  two  dimensional  case: 

51  -  LiLi  ai  +  LiL2a2 

52  =  L2Liai  +  L2Lia2 

By  inverting  the  Lij  matrix  we  could  solve  for  ai  and  derive  the  exact  representation  of  the  image 
pros'tding  the  basis  set  were  complete.  The  computational  task  is  well  known  providing  the  basis 
vectors  are  orthogonal.  If  they  are  not,  the  computational  task  is  formidable  for  any  reasonable 
sized  matrix,  and  accounts  for  the  general  lack  of  interest  in  nonorthogonal  representations 

In  practice,  the  off-diagonal  terms  become  an  important  indicator  of  the  orthogonali'  of  the  pre- 
attentive  filter  If  the  lengths  of  all  filter  vectors  are  normalized  on  a  unit  hypersphere,  then  the 
diagonal  terms  will  be  unity  and  the  off-diagonal  terms  of  the  of  the  matrix,  depending  on  their 
size,  will  show  how  close  the  vectors  are  to  being  orthogonal. 

The  important  conclusion  from  this  generalization  is  that  all  preattentive  filters  can  be  described 
by  such  an  expansion,  and  their  detailed  character  depends  on  the  actual  mathematical  form  of  the 
Li  terms.  Thus  the  choice  of  lenses  depends  on  the  detailed  properties.  We  note  also  that  each 
lense  computes  an  inner  product  with  the  input  image,  thus  each  lens  regardless  of  type  has  the 
same  computational  loading  The  minimal  computational  load  will  thus  depend  only  on  the 
number  of  lenses  required  to  achieve  the  desired  feature  vector. 

Invariance  Properties 

Under  most  conditions  encountered  in  real  guidance  and  control  problems,  the  image  space  must 
be  considered  unconstrained  by  orientation,  and  the  image  boundaries.  In  terms  of  the  processing 
required  in  a  vision  system,  this  implies  that  the  image  (or  object)  can  be  translated  both 
horizontally  and  vertically  and  arbitrarily  rotated.  In  some  cases,  the  image  will  be  subjected  to 
magnification  or  contraction.  This  requirement  places  a  very  limiting  constraint  on  the  generation 
of  the  feature  vector  which  must,  if  required,  pieserve  the  set  of  features  under  the  potential  of  all 
these  variations. 

An  es,sential  characteristics  in  the  definition  of  a  vision  system  is  the  limitations  on  translational 
and  rotational  invariances,  and  as  a  result  the  .selection  of  the  feature  vector  must  reflect  these 
requirements. 

Foarler-Mellin  Filters 

Filters  based  on  Fourier  coefficients  depend  on  the  spectral  information  contained  in  the  image. 
Fourier  coefficients  can  be  computed  along  the  image  vector  (considered  as  a  time  series),  as  a  two 
dimensional  transform  in  x  and  y  coordinates  or  as  a  polar  transform.  The  major  weakness  of  this 
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approach  is  the  large  number  of  coefficients  required  to  generate  the  feature  vector.  To 
completely  capture  the  image  suitable  for  reproduction  theoretically  requires  an  infinite  series. 
Several  alternatives  to  reduce  the  number  of  coefficients  are  used  in  practice,  including  the  Gabor- 
Logon  and  the  Fourier-Mellin  variations. 

The  Fourier-Mellin  transform  is  a  two  step  procedure  which  first  computes  the  polar  Fourier 
transform,  and  then  the  energy  moments  along  the  radius. 

The  approach  begins  with  a  representation  of  the  image  in  polar  coordinates  I[r,9].  A  set  of 
circular  harmonic  can  be  generated  as: 

Fm(r)  =  (l/2it)j'l(r,0)  oxp[-im0]d9 

where  the  circular  harmonic  frequency  m  is  an  integer.  The  Fm(r)  are  referred  to  as  a  circular 
harmonic  function  (CHF)  The  give  the  energy  at  each  frequency  as  a  function  of  the  radius. 

The  image  can  be  reconstructed  as  a  Fourier  series  by. 

I(r,0)  =  EFm(r)exp(im0) 

These  coefficients  could  (and  are)  used  in  some  applications  as  the  feature  vector.  In  some  (..ases 
the  energy  distribution  as  a  function  of  the  radius  of  each  harmonic  can  be  used.  This  distribution 
can  be  modelled  using  moments  and  are  computed  in  general  as  a  Mellin  transform: 

Ms.m  =jrs-'F,„(r)dr 

These  coefficients  are  referred  to  as  Fourier-Mellin  descriptors  In  general,  s  can  be  a  complex 
number.  In  practice,,  it  is  usually  real.  It  is  usually  the  case  that  a  few  moments  will  be  sufficient 
to  describe  the  image. 

A  F-M  spatial  filter  is  constructed  by  generating  impulse  response  functions  of  the  form: 

Fr(x,y)  =  {r''-2cxp{im9)}* 


where  *  indicates  complex  conjugate. 

Scale  ai  1  intensity  invariance  can  be  obtained  by  suitable  normalizations  of  the  F-M  description. 
If  the  descriptors  are  computed  for  S  a  real  number,  then  the  scale  and  intensity  of  an  image  can  be 
varied  by  multipliers  a  and  k  In  which  case,  the  descriptors  become: 

IMs,„|2  =  a2sk2|Ms.n,|2 

The  scale  and  intensity  invariance  can  be  achieved  by  defining  a  normalized  invariant  feature  as: 

<I)  =  lMs.„.|2/iiMsg2 

All  moments  of  the  same  order  suffer  the  same  multiples,,  and  hence  the  feature  <I)(s,m)  remains 
invariant  under  translation,  rotation,  seal  and  illumination 

The  advantages  of  this  approach  are 

1.  The  representation  is  completely  invariant . 

2.  The  number  of  moments  required  is  normally  small 


Gabor  Filters 

A  Gabor  filter  is  a  variant  on  the  Fourier  approach.  The  Gabor  filter  consists  of  a  two- 
dimensional  Fourier  transform  weighted  by  a  two-dimensional  Gaussian  function.  The  results  is  a 
filter  which  is  translationally  invariant,  but  is  rotationally  dependent. 

The  two-dimensional  Gabor  filter  is  represented  as  a  .series  of  two-dimension  Gabor  functions 

t(x,y)  =  1  aiG(x.y) 

The  two-dimensional  Gabor  filter  is  the  product  of  a  2-D  sinusoid  and  a  2-D  Gaussian  weighting 
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function.  This  has  been  shown  by  Daugman  [C-2]  to  achieve  a  minimal  space-time  uncertainty,  and 
also  to  provide  a  mode!  of  animal  vision  systems. 

The  initial  Fourier  spectrum  yields  an  orthogonal  basis  for  examining  the  image,  however,  the 
Gaussian  weighting  function  renders  the  final  feature  vector  non-orthogonal. 

Define 

G  =  M(x,y)*-  W(x,y) 

Where  M  is  the  2-D  sinusoid  and  W  is  the  Gaussian  weight. 

Let 

M(x,y)  =  exp{-27ci(uox  -hvoy)) 

Where  uQ  and  vQ  are  spatial  frequencies  in  cycles  per  radian. 

This  function  can  be  centered  at  an  arbitrary  point  xm.ym  in  the  image  by  defining. 


<G,(x,y)  Gj(x,y)>  =  expl-;t(u,  -  up2/(a ,2  -f-  aj2)  +  (v,-Vj)2/(b.2-t-bj2)] 

Daugman  [C-2]  has  shown  also  that  these  functions  achieve  a  maximum  possible  joint  resolution  in 
the  conjoint  2-D  visual  and  the  2-D  frequency  domains.  He  has  shown  that  they  achieve  the 
theoretical  lower  bound  on  joint  uncertainty  in  the  two  conjoint  domains(x,y),  the  visual  space,  and 
(u,v)  the  spatial  frequency  domain.  Defining  uncertainty  in  each  of  the  four  variables  by  the 
normalized  second  moments.  Ax,  Ay,  Au,  Av  about  the  principle  axes  he  has  shown  that  for; 

(Ax)(AyKAu)(Av)  ^  1/16x2 

the  lower  bound  exists  for  the  2-D  Gabor  functions. 
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Advantages:  Compression  of  less  than  one  bit  per  pixel  has  been  reported. 

Disadvantages:  There  are  eight  parameters  that  must  be  set  either  by  experiment  or  simulation. 

Object  Recognition  -  The  Work  of  Sheng 

A  group  lead  by  Professor  Yulong  Sheng  at  Laval  University  [A]  has  published  a  wide  variety  of 
theoretical  and  experimental  papers  exploiting  the  Fourier-Mellin  approach  to  generating  feature 
vectors. 

It  is  evident  from  the  equations  that  the  Fourier-Mellin  transform  does  not  yield  translational 
symmetry  because  of  the  dependence  of  the  center  point  chosen  for  the  polar  representation  of  the 
image.  In  mo.st  applications,  some  choice  of  center  is  necessary,  often  the  center  of  gravity  of  the 
picture,  or  some  such  definition  that  can  be  fond  by  scanning. 

Arsenault  and  Sheng  [A]]  have  proposed  some  practical  limitations  on  the  number  of  harmonic 
components  necessary  to  represent  an  image  (a  space  shuttle)  on  a  uniform  background.  A 
centroid  of  the  shuttle  was  computed  and  chosen  as  the  center  of  the  filter.  The  image  was 
reconstructed  using  increasingly  higher  order  harmonic  components.  Their  experiment  showed 
that  up  to  thirty  seven  components  were  needed  to  provide  good  detail  (e.g  ,  to  show  the  tip  of  the 
tail). 

They  concluded  form  this  experiment  that  a  simple  inverse  relationship  existed  between  the 
angular  dimension  of  the  object  and  the  angular  frequency  of  the  CHC: 

"An  object  detail  subtending  and  angle  of  InftAc  at  the  center,  where  Me  (an  integer)  can  be 
desenbed  by  CHC  orders  up  to  Me." 

As  a  consequence  of  this,  they  proposed  that  for  an  image  of  NxN  pixels,  the  maximum  circular 
harmonic  frequency  is  equal  to  the  integer  part  of 

They  observed  also  that  if  an  object  has  n-fold  rotational  symmetry,  the  image  has  an  angular 
periodicity  of  2rt/n  Thus  the  CHC  are  different  from  zero  only  at  the  discrete  angular  frequencies 
of. 

i'll  —  0,  +  n,  +  2n. 

Two  and  even  four-fold  symmetry  is  not  uncommon  in  som...  mage  classification  problems. 

Image  Exemplars  -  A  Generalization 

When  a  filter  array  has  been  defined,  each  filter  is  an  array  of  numbers  corresponding  to  the 
dimensionality  of  the  input  image  spac  The  filter  could  therefore  be  consider  as  a  synthetic 
image  and  the  result  of  the  calculation  is  an  inner  product  of  the.  input  image  and  the  filter  Each 
filter  in  the  bank  contributes  to  the  feature  vector  a  number  representing  its  closeness  to  the  input 
image  The  set  of  numbers  must  then  be  evaluated  depending  on  the  application. 

In  a  sense  each  filter  form  ig  the  feature  vector  could  be  regarded  as  a  synthetic  image  in  image 
space.  The  task  is  to  find  a  set  of  such  vectors  to  yield  the  feature  useful  for  the  task  at  hand. 
Since  an  image  occupies  a  potentially  convoluted  volume,  it  seems  reasonable  to  suspect  that 
regardless  of  the  mechanism  for  arranging  the  eights  of  the  pre-attentive  filters  that  the  end  result 
is  a  set  of  vectors  which  cover  the  image  volumes  for  each  class  of  image,  in  such  a  way  as  to 
obtain  the  generalization  needed  for  the  task.  The  filters  are  in  some  sense  a  set  of  exemplars  of 
the  of  the  volume  occupied  by  the  class.  Based  on  this  model  there  may  be  some  hope  in  the  future 
of  synthesizing  the  appropriate  exemplars  as  a  function  of  the  optimization  requirements. 

SUMMARY  AND  CONCLUSIONS 


Summary 

The  principle  thesis  developed  in  the  preceding  has  been  that  the  architecture  for  machine  vision 
systems  will  probably  be  based  on  some  mode!  of  the  animal  vision  system.  This  model  suggests  a 
two  fold-partition  of  functionality:  first  the  extraction  from  the  image  space  of  a  set  of  features. 
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followed  by  some  form  of  interpretive  function  responsible  for  creating  the  appropriate  response. 

Feature  extraction  may  be  done  directly  by  some  form  of  sensor  suite  or  be  preceded  (as  is 
common  now)  by  some  scanning  mechanism,  which  in  turn  supplies  an  image  representation  to 
synthetic  feature  extractors.  In  any  event,  the  result  is  the  same;  a  feature  vector  must  be 
interpreted  by  subsequent  processing  to  derive  the  final  result. 

The  subsequent  processing  may  be  directed  toward  such  functions  as  target  location  and 
identification,  classification  of  global  image  features,  or  the  exact  reproduction  of  the  image  in  a 
reduced  format  from  that  produced  by  the  scanners. 

We  have  shown  that  the  feature  extraction  can  be  represented  by  a  general  mathematical  model, 
however,  we  have  not  been  able  to  show  how  this  could  be  4  plied  to  a  particular  requirements. 

Conclusions 

There  is  at  this  time  no  global  approach  to  defining  the  desired  task  of  the  vision  system  and 
synthesizing  the  components  necessary  to  optimize  this  task.  A  variety  of  feature  generation 
mechanisms  have  been  studied,  and  experimental  results  are  available  for  different  tasks,  however 
no  known  optimization  procedure  exist  at  this  time  On  the  other  hand  vision  systems  seem  to 
encompass  a  wide  variety  of  techniques  in  neural  computing  not  normally  associated  with  'seeing.' 
Perhaps  vision  as  understood  by  machines  is  a  larger  activity  than  that  normally  associated  with 
seeing. 

It  seems  clear  however  that  machine  vision  systems  will  evolve  according  to  particular  needs,  and 
the  final  integration  of  these  into  a  human-like  capability  will  probably  be  a  result  of  advances  both 
in  the  physiology  and  psychology  of  animal  vision  system,  combined  with  the  development  of 
mathematical  models  and  the  creation  of  the  appropriate  processing  capabilities  The  future 
development  of  vision  systems  will  occur  in  a  fragmented  way  depending  on  specific  requirements, 
as  we  increase  our  understanding  of  mechanisms  for  deriving  the  appropriate  features  for  the  task 
at  hand. 

The  computational  complexity  of  vision  systems  demands  a  high  level  of  computer  capability  and  it 
is  probably  safe  to  say  that  while  an  understanding  of  the  process  can  be  obtained  by  simulation  in 
software,  the  eventual  development  of  real-time  systems  will  depend  on  hardware  for  both  the 
sensors  and  interpretation.  Neither  of  these  possibilities  are  too  remote.  Gabor  filters  are  now 
being  developed  by  HNC  and  high  performance  analog  neural  chips  are  available  from  Intel. 
These  chips  (801  VON W)  each  contain  64  neurons  each  containing  80  weights  The  chip  achieves 
two  billion  multipy-accumulate  operations  per  second.  The  next  few  years  will  see  special  purpo.se 
vision  system  in  wide  availability  and  use. 


ANNOTATED  BIriLIOGRAPHY 


A.  Target  Recognition 

The  following  conference  proceedings  contains  numerous  papers  directed  at  the  problem  of  target 
recognition  using  neural  networks.  It  is  an  obvious  first  reading  for  this  particular  application  of 
neural  computing. 

1 .  "Neural  Networks  for  Automatic  Target  Recognition,"  A  Research  Conference  at  the  Wang 
Institute,  Boston  University,  May  11-13,  1990. 

The  following  papers  by  Sheng  and  associates  outlines  the  theory  and  practical  application  of 
Fourier-Mellin  filters  to  the  problem  of  target  identification: 

2.  Yulong  Sheng,  Henri  H.  Arsenault,  "Experiments  on  Pattern  Recognition  using  Invariant 
Fourier-Mellin  Descriptors,"  J.  Opt.  Soc.  Am.,  AA'el.  3/No.  6,  June  1986,  pp.  771-776. 

3.  Yulong  Sheng,  Henri  H.  Arsenault,  "Object  Detection  from  a  Real  Scene  using  the  Correlation 
Peak  Coordinates  of  Multiple  Circular  Harmonic  Filters,"  Applied  Optics,  Jan  15,  1989/Vol. 
28/No.  2,  pp.245-249. 
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4.  Yulong  Sheng,  et  al,  "Frequency-Domain  Fourier-Mellin  Descriptors  for  Invariant  Pattern 
Recognition,"  Optical  Engineering,  May  1989A^ol.  27,  No.  5,  pp.345-357. 

5.  Yulong  Sheng,  "Fourier-Mellin  Spatial  Filters  for  Invariant  Pattern  Recognition,"  Optical 
Engineering,  May  1989A^ol.  28/No.  5,  pp.  494-500. 

B.  Image  Compression 

The  following  paper  by  Gabor  is  the  original  derivation  of  the  Gabor  Logon.  It  is  written  in  older 
frame  of  reference  and  is  somewhat  difficult  to  read  (depending  on  your  background): 

1.  Gabor,  D.,  "Theory  of  Communication,"  Journal  I.  E.  E.,  London  1946,  pp.  429-457. 

In  this  seminal  paper  Daugman  brings  together  the  work  of  many  previous  researchers  and 
demonstrate  the  image  compression  and  segmentation  capabilities  of  the  Gabor  pre-attentive  filters. 
The  paper  contains  a  host  of  references  to  earlier  work. 

2.  John  G.  Daugman,  "Complete  Discrete  2-D  Gabor  Transformations  by  Neural  Networks  for 
Image  Analysis  and  Compression,"  IEEE  Transactions  on  Acoustics,  Speech  and  Signal  Processing, 
Vol.  36,  No.  7,  July  1988,  pp.  1169-1179. 

C.  Classification 

Lois  Garand,  "Automated  Recognition  of  Oceonic  Cloud  Patterns  and  its  application  to  Remote 
Sensing  of  Meteorological  Parameters,"  Ph.d.  Thesis,  Department  of  Meteorology,  University  of 
Wisconsin-Madison,  1986. 

B.  Archie  Bowen,  and  Jianli  Liu,  "Pattern  Classification  from  Raster  Data  using  Vector  Lenses, 
Neural  Networks  and  Expert  Systems,"  Mapping  and  Modelling  for  Navigation,  NATO  AI  Series 
F,  Vol.  F65,  Edited  by  L.  F.  Pau,  1990. 

D.  Pre-Attentive  Filters 

An  excellent  paper  on  the  general  area  of  pre-attentive  filters  is  contained  in 

1.  John  G.  Daugman,  "Six  Formal  Properties  of  Two-Dimensional  Anisotropic  Visual  Filters: 
Structural  Principles  and  Frequency/Orientation  Selectivity,"  IEEE  Trans.  Systems,  Man,  and 
Cybernetics,  Vol.  SMC- 13,  No.  13,  September/October  1983. 

2  John  G.  Daugman,  "Uncertainty  Relation  for  Resolution  in  Space,  Spatial  Frequency  and 
Orientation  Optimized  by  Two-dimensional  Visual  Cortical  Filters,"  J.  Opt.  Soc.  Am.  A/Vol.2, 
No.  7/July  1985,  pp.  1160-1169. 

3.  M.  R.  Turner,  "Texture  Di,scrimination  by  Gabor  Functions,"  Biological  Cybernetics, 
Springer-Verlag,  Vol.  ,55,  1986,  pp  71-82 

4  Eric  Suand,  "Dimensionality-Reduction  Using  Connectionist  Networks,"  IEEE  Transactions  on 
Pattern  Analysis  and  Machine  Intelligence,  Vol.  II,  No.  3,  March  1989,  pp.  304-314. 

E.  Animal  Vision  Systems 

Tlie  following  papers  present  experimental  evidence  of  the  functionality  of  animal  vision  systems: 

1.  Jones,  J.  P.,  and  L.  A.  Palmer,  "An  Evaluation  of  the  Two-Dimensional  Gabor  Filter  Model  of 
Simple  Receptive  Fields  in  Cat  Striate  Cortex,”  Jour,  of  Neurophysiology,  Vol.  58/No.  6,  Dec 
19878,  pp,  1233- 1258. 

2.  John  G.  Daugman,  "Two-Dimensional  Spectral  Analysis  of  Cortical  Receptive  Field  Profiles," 
Vision  Research,  Vol.  20,  pp  847-856,  Permagon  Press  Ltd.,  1980. 

3.  John  G.  Daugman,  "Uncertainty  Relation  for  Resolution  in  Space,  Spatial  Frequency,  and 
Orientation  Optimization  by  Two-Dimensional  Visual  Cortical  Filters,"  J.  Optical  Soc.  Am.,  Vol. 
2,  No.  7,  July  1985,  pp,  1160-1169. 


Lens 


Retina 


ANIMAL 


Generation  Interpretation 

MECHANICAL 


Figure  1:  Vision  Systems  Models 


B.  Partially  Overlapped 


Figure  2.  Image  Volumes  in  2*D 


Figure  3:  Image  Volume  Centroids 


7-1 


Neural  Networks  for  Military  Robots 
Dr.  W.A.Wright 

Sowerby  Research  Centre 
FPC  267  British  Aerospace 
Bristol  BS12  7QW. 


The  paper,  gives  a  short  review  of  mobile  robotic  re¬ 
search,  and  through  the  use  of  three  case  studies  which 
describe,  in  brief,  current  research  undertaken  at  three 
establishments,  indicates  the  role  tha*  neural  networks 
are  playing  in  this  process  and  hence  the  impact  that 
they  may  have  on  the  military  environment. 

The  three  case  studies  are  chosen  to  illustrate  the  ad¬ 
vantage,  in  terms  of  speed,  compactness,  and  adapt¬ 
ability,  of  the  use  of  these  systems  in  what  are  de¬ 
fined  as  “the  three  essential  functional  areas  for  mobile 
robot  control”: 

•  localisation  (where  am  I?), 

•  path  planning  (where  do  I  want  to  be?), 

•  obstacle  avoidance  (is  there  anything  in  the 
way?). 

The  first  case  study  describes  an  ultrasonic  obstacle 
avoidance  system  that  has  been  developed  by  the  Ger¬ 
man  company  IBP  Pietzsch  for  the  ESPRIT  II  project 
ANNIE.  The  second  is  a  description  of  an  investiga¬ 
tion,  carried  out  at  The  Sowerby  Research  Centre  also 
for  the  ANNIE  project,  into  the  use  of  a  neural  system 
for  the  localisation  a  known  mobile  robot  by  the  appro¬ 
priate  “fusing"  of  data  obtained  from  several  off-board 
sensors..  The  last  study  describes  a  VLSI  implemen¬ 
tation  of  a  localisaiion  and  path  planning  system  that 
has  been  designed  and  constructed  by  the  University  of 
Oxford’s  Robotics  Group. 

Although  it  is  not  intended,  by  presenting  these  case 
studies,  to  portray  them  as  the  extent  of  the  state  of 
the  art  in  this  field  it  is,  however,  hoped  that  they 
will  give  a  clear  idea  of  how  and  why  neural  networks 
are  being  used  in  this  area,  and  illustrate  the  potential 
advantages  to  be  gained  from  their  use  in  the  field  of 
military  robotics. 


Introduction 

Over  the  past  few  years  there  has  been  a  keen  interest 
in  the  development  of  the  military  robot.  Th's  has 
been  reflected  not  only  by  the  large  amount  of  work  on 
mobile  robotics  that  has  been  undertaken  at  various 
establishment  through-out  the  world  (see  appendbc  A) 
but  also  by  the  funding  that  has  been  made  available 
by  both  the  Department  of  Defence  in  the  USA  and 
the  British  and  other  European  Defence  Ministries  for 


research  projects  aimed  at  investigating  and  develop¬ 
ing  such  systems.  The  most  notable  of  these  projects 
are  possibly  the  DARPA  ALV  (Simpson  1987)  initia¬ 
tive,  the  French  ALV  initiative  ROVA  (Savage  1991) 
and  the  British  Mobile  Advanced  Robotics  Defence  Ini¬ 
tiative  MARDI  (Bateman  1991).,  These  projects  have 
concentrated  or  are  concentrating  upon  the  produc¬ 
tion  of  an  all-terrain  autonomous  mobile  vehicle  capa¬ 
ble  of  navigating  through  an  uncertain  environment, 
on  a  reconnaissance  mission  mapping  out  the  terrain 
or  seeking  out  a  particular  target  for  instance.  Other 
civil  projects,  the  most  notable  being  the  Mars  Rover 
(Wolfe  and  Chun  1987),  can  in  some  circumstances  be 
seen  as  derivatives  of  these^.  The  use  of  mobile  robots 
therefore  in  the  military  arena  is  not  a  thing  of  sci¬ 
ence  fiction.  The  autonomous  mobile  robot  is:  a  real, 
tracked,  wheeled,  multi-legged,  or  even  flying  vehicle. 

In  general,  however,  robotic  systems  developed  and  ac¬ 
tually  used  in  the  1980’8  come  in  a  very  different  guise. 
The  robots  that  have  already  found  their  way  into  fac¬ 
tory  production  lines  are  not  mobile  vehicles  but  the 
static  jointed  arms  or  the  more  extensive  assembly 
automated  units.  These  are  used  in  the  manufactur¬ 
ing  industry  for  the  automated  production  of  anything 
from  PCBs  through  to  cars  or  washing  machines.  In 
comparison  to  the  static  systems  the  functionality  of 
the  industrial  mobile  robotic  systems  are  much  less 
developed.  In  general  most  industrial  mobile  systems 
are  either  controlled  remotely  via  an  operator  or  op¬ 
erate  in  very  restricted  environments  such  as  on  the 
factory  floor  following  a  buried  metal  strip.  The  truly 
autonomous  mobile  robot,  which  is  of  prime  interest 
to  the  nndlitary,  is  still  very  much  of  a  novelty. 

The  major  problem  involved  in  producing  a  truly 
autonomous  mobile  robot  is  that  although  in  many 
cases  the  processing  required  is  understood  hardware 
limitations  prevent  it  from  being  carried  out  with  a 
speed  that  is  great  enough  on  equipment  that  is  small 
enough  to  be  practical.  As  devices  have  become  faster 
and  faster  this  imbalance  between  processing  ability, 
size,  and  processing  power  is  being  redressed.  This  pa¬ 
per  intends,  through  a  short  review  of  mobile  robotic 
research,  and  the  use  of  three  case  studies  which  de¬ 
scribe,  in  brief,  current  research  undertaken  at  three 
establishments,  to  indicate  the  role  that  neural  net¬ 
works  are  playing  in  this  process  and  hence  the  impact 
that  they  may  have  on  the  military  environment. 

‘  A  lilt  outlining  the  main  ALV  projecti  and  institutions  in¬ 
volved  in  these  is  given  in  appendix  A. 


7-2 


It  is  not  intended  in  this  paper  to  produce  a  com¬ 
plete  overview  of  the  use  of  neural  networks  for  mo¬ 
bile  robots  since  this  would  be  too  great  a  task.  Nor 
is  it  intended,  by  presenting  these  few  case  studies, 
to  portray  them  as  the  extent  of  the  state  of  the  art 
in  this  field.  This  would  be  very  unfair  to  a  large 
number  of  very  able  workers.  It  is  intended,  however, 
through  these  crse  studies  to  demonstrate  how  this 
technology  is  bring  used  and  the  potential  advantages 
that  can  be  obtained  with  the  technology.  Each  case 
study  reviews  a  piece  of  on-going  research,  attempts 
to  highlight  the  relationship  to  mobile  robotics  in  gen¬ 
eral,  and  the  contribution  made  by  neural  networks  in 
particular,  summarises  experimental  results,  and  dis¬ 
cusses  their  implications. 

Each  case  study  describes  the  use  of  neural  networks 
in  what,  for  the  purposes  of  this  paper,  are  defined 
as  the  three  principal  functional  areas  of  any  mobile 
robot: 

•  localisation  (where  am  I?), 

•  path  planning  (where  do  I  want  to  be?), 

•  obstacle  avoidance  (is  there  anything  in  the 
way?). 

The  thinking  behind  these  functions  and  the  con¬ 
straints  they  impose,  together  with  a  brief  r<8um6  of 
the  work  now  being  undertaken  in  the  area  of  mobile 
robotics  with  particular  regard  to  the  use  of  neural 
networks,  are  given  in  the  next  section. 

The  first  two  case  studies  stem  from  the  ANNIE 
project  ( The  Application  of  Neural  Networks  for  In¬ 
dustry  in  Europe)  of  which  British  Aerospace  is  a  full 
partner.  This  is  an  ESPRIT^  II  project,  which  is  sup¬ 
ported  by  the  European  Commission,  and  aims  to 
investigate  the  use  of  neural  networks  in  areas  rele¬ 
vant  to  European  industry.,  The  project  is  divided 
into  three  application  areas; 

•  image  processing, 

•  optimisation, 

•  control. 

Both  the  ANNIE  case  studies  describe  work  that  is 
being  conducted  within  the  control  application  area 
which  has  concentrated  on  the  investigation  of  the  use 
of  neural  networks  in  areas  of  particular  relevance  to 
mobile  robotics. 


^  ESPRIT  (European  Strategic  Progranune  for  Reiearch  in  In¬ 
formation  Technology)  i$  a  European  Commiiiion  funding  body 
for  collaborative  reiearch  in  the  area  of,  ai  the  name  luggeitt, 
information  technology.  Thii  includei  many  varied  areas,  from 
office  syitemi  to  industrial  robots. 


The  first  case  study  describes  work  carried  out  by  IBP 
Pietzsch  who  are  a  small  German  company  special¬ 
ising  in  the  production  of  inertial  and  robotic  plat¬ 
forms.  Here,  neural  networks  have  been  used  to  pro¬ 
vide  an  obstacle  avoidance  function  by  studying  the 
signatures  obtained  from  a  bank  of  ultra-sonic  sen¬ 
sors  placed  around  the  robot.  Although  slightly  arti¬ 
ficial  this  study  provides  a  graphic  illustration  of  the 
use  of  a  neural  network  for  sensor/motor  association, 
an  area  where  the  use  of  neural  networks  is  becoming 
more  prevalent. 

The  second  case  study  describes  the  work  that  has 
been  carried  out  recently  by  British  Aerospace  for  the 
ANNIE  project  at  the  company’s  corporate  research 
laboratories,  the  Sowerby  Research  Centre.  Here  a 
neural  network  has  been  integrated  into  a  vision  based 
surveillance  system  which,  by  matching  the  data  pro¬ 
cessed  by  the  surveillance  system  with  data  derived 
from  a  mobile  robot’s  own  sensors,  is  able  to  identify 
and  so  localise  the  robot.  The  work  demonstrates  the 
use  of  neural  networks  for  data  fusion  and  advantage 
to  be  gained  from  hybrid  system  which  comprises  sev¬ 
eral  neural  networks. 

The  last  case  study  describes  the  work  now  being  un¬ 
dertaken  a  the  University  of  Oxford’s  Robotics  Group 
under  Dr  Tarassenko.  The  group  has  succeeded  in 
constructing  a  small  working  autonomous  robot  based 
on  the  analogue  Pulsed  Stream  CMOS  chips  that  have 
been  designed  at  Oxford  in  conjunction  with  Edin¬ 
burgh  University’s  Department  of  Electrical  Engineer¬ 
ing.  The  work  represents  one  of  the  first  demonstra¬ 
tions  of  the  integration  of  neural  network  hardware 
into  the  control  architecture  of  a  mobile  robot  and 
illustrates  the  advantages,  in  terms  of  speed  and  com¬ 
patibility,  that  can  be  gained  from  these  systems.  The 
Oxford  project  is  particularly  concerned  with  the  pro¬ 
duction  and  demonstration  of  an  integral  localisation 
and  path  planning  system  for  a  mobile  robot. 

Finally,  the  possibilities  that  lie  in  store  in  the  area  of 
mobile  robotics  are  briefly  reviewed  in  the  final  sec¬ 
tion.  Although  it  is  always  hard  to  predict  new  devel¬ 
opments  with  any  certainty  it  is  hoped  that  at  least 
some  idea  of  what  the  future  might  hold  is  given  here. 

Background 

What  is  an  Autonomous  Mobile  Robot? 

As  stated  in  the  introduction  there  are  now  a  vast  va¬ 
riety  of  robotic  systems.  However,  this  paper  will  con¬ 
centrate  upon  the  use  of  neural  networks  in  the  design 
of  mobile  robotics  and  their  impact  in  the  military  en¬ 
vironment.  The  first  requirement  in  such  a  discussion 
is  to  define  what  is  meant  by  the  phrase  autonomous 
mobile  robot.  For  the  purposes  of  this  paper  it  will  be 
taken  that  an  “autonomous  mobile  robot”  is; 
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any  self  contained  system  which  is  able  to 
move  through  an  “environment”  as  part  of  a 
process  of  achieving  certain  goals  or  objec¬ 
tives,  and  further  is  able  to  react  to  changer 
or  unforeseen  events  in  that  environment,  in 
order  to  pursue  the  achievement  of  those  ob¬ 
jectives,  in  “real  time”. 

“Environment”  here  can  mean  anything  from  the  lab¬ 
oratory  which  is  structured  and  usually  well  under¬ 
stood  through  to  open  country  which  will  be  unstruc¬ 
tured  and  possibly  at  best  only  partially  known. 

Here  “real  time”  means: 

to  be  able  to  react  to  a  stimuli  at  sufficient 
speed  such  that  any  action  taken  as  a  result 
of  that  stimuli  occurs  in  time  for  that  action 
to  be  relevant. 

What  is  regarded  here  as  “real  time",  therefore, 
changes  depending  upon  the  environment  and  the  per¬ 
formance  required  of  the  robot  moving  through  that 
environment.  For  instance  real  time  constraints  re¬ 
quired  for  short  term  obstacle  avoidance  will  differ 
from  those  required  for  long  term  path  planning.  A  ve¬ 
hicle  that  requires  several  minutes  to  calculate  a  new 
trajectory  around  an  obstacle  where  that  obstacle  is 
only  seconds  away  can  not  be  said  to  be  acting  in  “real 
time”,  where  as  a  vehicle  that  takes  minutes  to  calcu¬ 
late  a  path  that  will  take  hours  to  negotiate  may  cer¬ 
tainly  be  regarded  as  processing  in  “real  time”.  The 
real  time  requirements  for  a  mobile  robot  can  there¬ 
fore  range  from  a  few  milli-seconds  to  possibly  sev¬ 
eral  minutes,  depending  upon  the  circurrutances.  The 
time  constraints  of  the  various  embedded  control  loops 
also  have  a  major  input  on  the  interpretation  of  the 
phrase  “real  time”..  This  is  a  key  clement  of  the  real 
time  requirement  for  mobile  robotic  systems,  and  one 
for  which  appropriate  processing  architectures  must 
be  designed. 

Functional  Requirements  for  a  Mobile 
Robot 

The  nature  of  the  type  of  processing  required  in  real 
time  for  any  autonomous  robot  moving  through  a 
changing  and  uncertain  environment  are  summarised 
for  the  purpose  of  this  paper  under  the  three  headings: 

Path  planning:  given  the  position  of  the  robot  in 
the  environment,  path  planning  is  required  to  al¬ 
low  the  robot  to  reach  its  desired  destination, 
whilst  allowing  for  the  relevant  factors  in  the  envi¬ 
ronment  such  as  any  hazards  and  difficult  terrain. 
Often  these  environment  factors  may  change,  due 
to  unforeseen  events,  or  obstacles  etc.  In  general, 
therefore,  it  is  desirable  that  the  map  of  the  envi¬ 
ronment  and  the  path  planning  system  is  adapt¬ 
able  to  accommodate  these. 


Localisation:  given  a  map  of  the  environment  the 
position  of  the  robot  in  that  environment  needs  to 
be  determined.  In  practice  this  can  be  obtained 
in  a  variety  of  ways.  Dead  reckoning,  using  the 
vehicle’s  inertial  navigation,  or  odometry,  can  in 
some  cases  be  sufficient.  However,  other  meth¬ 
ods  are  available  such  as  the  use  of  beacons  or 
GPS  satellite  localisation.  Other  methods,  neu¬ 
ral  implementations  of  which  are  described  in  the 
case  studies,  use  observed  sensor  information  and 
compare  that  with  a  taught  or  preprogrammed 
world  model. 

Obstacle  avoidance:  a  mobile  robot  must  have  the 
ability  to  respond  to  unexpected  obstacles  in 
“immediate”  path.  Often  such  systems  in¬ 
volve  the  use  of  computer  vision  techniques,  the 
use  of  active  sensors  (e.g.  ultra-sonics,  laser 
range  finders,  etc),  or  a  combination  of  both 
(Thorpe  et  al.  1987) 

These  three  criteria,  obviously,  give  a  somewhat  re¬ 
stricted  view  of  the  functionality  of  a  mobile  robot. 
The  fact  that  any  robotic  system  may  have  other  sub¬ 
sidiary  goals,  such  as  searching,  or  tracking  and  fol¬ 
lowing  a  particular  object,  has  been  ignored. 

It  is  clear  that  the  relative  real  time  constraints  for 
each  of  these  functions  will  differ  from  one  to  the 
next.  In  the  simplest  case,  where  obstacle  avoidance  is 
purely  reactive  and  has  no  input  to  the  path  planning, 
then  this  function  is  required  to  have  the  shortest  re¬ 
sponse  time.  However,  in  practice  the  response  times 
for  the  other  functions  increase  as  the  complexity  of 
the  interrelations  between  the  differing  functions  is  in¬ 
creased. 

As  has  already  been  mentioned,  the  high  level  of 
computation  involved  in  creating  a  real  time  sys¬ 
tem  with  this  degree  of  functionality  in  a  rela¬ 
tively  small  space  available  on  an  autonomous  sys¬ 
tem  is  one  of  the  major  limitations  of  the  cur¬ 
rent  systems  and  has  troubled  many  research  pro¬ 
grammes  such  as  the  DARPA  ALV  (Simpson  1987) 
for  example.  More  recent  ALV  programmes  such 
as  the  ESPRIT  II  programme  PANORARMA® 
(Vacherand  et  al.  1990)  and  the  Universitat  der  Bun- 
deswehr  ALV  (Dickmanns  1990)  have  successfully 
overcome  this  problem  by  using  dedicated  image  pro¬ 
cessing  hardware  coupled  with  a  distributed  paral¬ 
lel  processing  system.  In  the  case  of  PANORARMA 
this  consists  of  Transputers  together  with  a  variety 
of  other  processors  (a  SUN  4  and  several  68000s) 
(Vacherand  et  al.  1990). 

Other  Initiatives 

In  some  limited  cases  the  above  requirements  have 
been  overcome,  either  by  restricting  the  functional¬ 
ity  of  the  robot  or  by  ensuring  that  the  robot  has  only 

’  Perception  end  Navigation  Organii ation  for  Autonomous  Mo¬ 
bile  Applications 
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to  function  in  a  well  ordered  and  limited  environment. 
For  the  industrial  market  limited  “autonomous”  sys¬ 
tems  are  now  available  (e.g.  the  cleaning  robot  pro¬ 
duced  by  Robosoft  in  Paris).  These  are  generally 
robots  with  a  very  limited  functionality  designed  to 
clean  floors  or  transport  materials.  The  functionality 
of  these  systems  is  usually  limited  to  simple  obsta¬ 
cle  avoidance.  This  is  achieved  through  the  analysis 
of  returns  obtained  from  an  active  sensor  or  sensors, 
usually  ultra-sonic,  placed  on  the  robot.  These  allow 
the  proximity  of  objects  to  be  determined  and  so  the¬ 
oretically  any  obstacles  in  the  path  of  the  robot  may 
be  detected.  Any  path  planning  that  is  performed  in 
these  limited  systems  is  usually  pre-programmed  into 
the  robot  before  operation  and  is  therefore  not  adap¬ 
tive. 

In  the  military  field  the  examples  of  working  au¬ 
tonomous  vehicles  are  not  as  common.  Perhaps  the 
most  dramatic,  as  has  been  highlighted  by  the  resent 
Gulf  war,  is  the  Cruise  Missile.  This  uses  a  localisation 
system  called  TEPtCOM  to  up-date  its  position  and 
so  allow  mid-course  correction  to  the  missile’s  flight 
path.  The  TERCOM  system  works  by  matching  the 
ground  terrain  with  a  map  of  the  ground  relief  held 
digitally  in  the  memory  of  the  nussiie.  Such  a  system, 
therefore,  exhibits  two  of  the  mrun  functions  of  an  au¬ 
tonomous  vehicle;  localisation  and  path  planning.  Ob¬ 
stacle  avoidance  particularly  with  regard  to  the  termi¬ 
nal  phase  of  the  missile’s  mission  is,  not  surprisingly, 
omitted. 

Probably  the  most  familiar  example  of  an  autonomous 
robotic  system  in  the  military  and  civil  fields  is  the  au¬ 
tonomous  land  vehicle  or  ALV.,  There  have  been  and 
are  many  research  projects  to  investigate  and  build 
ALV.  A  brief  list,  which  gives  some  idea  of  the  range 
and  scope  of  these  projects,  is  given  in  appendix  A. 
The  environment  that  the  typical  military  ALV  has 
to  operate  in  can  be  very  extreme.  Unlike  the  con¬ 
trolled  and  sterile  environments  found  on  most  indus¬ 
trial  shop  floors  a  military  ALV  used  in  anger  would 
be  expected  to  be  able  to  function  not  only  in  an  out¬ 
door  environment  where  diurnal,  climatic,  and  sea- 
cuiial  conditions  can  have  a  great  effect,  but  also  un¬ 
der  very  hostile  conditions  that  are  found  near  and  on 
the  battle  field.  It  is  not  surprising  that  such  an  en¬ 
vironment  is  likely  to  be  very  unstructured  and  may 
change  dramatically.  Many  ALVs  projects  attempt 
to  use  vision  to  guide  the  vehicle  and  for  the  obsta¬ 
cle  avoidance  function  (Bateman  1991,  Savage  1991, 
Buxton  and  Roberts  1990,  Vacherand  et  al.  1990, 
Wolfe  and  Chun  1987,  Klein  et  al.  1987, 

Simpson  1987,  Mitchell  and  Keirsey  1984).  Locali¬ 
sation  can  also  be  achieved  by  visually  identify¬ 
ing  beacons  or  way  markers  (Vacherand  et  al.  1990). 
The  attraction  of  vision  that  is  important  in  the 
military  environment  is  that  it  is  passive.  How¬ 
ever,  active  systems  such  as  laser  rangers  have 
also  been  used  (Thorpe  et  al.  196 1,  Klein  et  al.  1987, 
Buxton  and  Roberts  1990). 


Although  the  functional  road  following  ALV  is  now 
nearing  reality  the  amount  of  computing  power  that  is 
required  to  drive  these  roboUc  systems  can  be  a  lim¬ 
iting  factor  in  achievable  performance.  This  can  be 
illustrated  by  the  processing  power  required  for  one 
of  the  earlier  ALV  systems  at  Carnegi-Mellon  Univer¬ 
sity,  the  NavLab  (Thorpe  et  al.  1987).  This  system 
was  built  into  a  Chevrolet  van  and  used  a  vision  and 
laser  range  finder  system  to  guide  the  vehicle  down 
a  metaled  road  whilst  avoiding  any  obstacles  found 
in  its  path.  The  processing  power  required  for  this 
system  consisted  of  the  Warp  systolic  array  and  4 
sun  computers.  At  the  time  this  power  allowed  the 
van  to  travel  unassisted  at  a  speed  of  ~  2  miles  per 
hour.  Obviously,  since  the  NavLab  was  first  built 
computational  power  has  improved.  ALVs  such  as 
that  built  by  Professor  Dickmanns  at  the  Universi- 
tat  der  Bunderswehr  are  able  to  travel  at  ~  50Km/h 
on  metaled  well  constructed  roads  (Dickmanns  1990, 
Dickmanns  and  Graefe  1988).  Furthermore,  future 
improvements  to  this  system  are  expected  to  allow 
the  vehicle  to  travel  on  unmetaled  tracks  hopefully 
over  hilly  terrain.  This  system  uses  a  large  array 
of  Transputers  coupled  with  specifically  designed  im¬ 
age  processing  hardware.  Other  notable  systems  are 
the:  French  ROVA  (Savage  1991)  “Autonomous  Road 
Vehicle”,  the  UK  MARDI  (Bateman  1991)  systems 
the  eight  wheeled  Mairs  Rover  (Spiessbach  et  al.  1987, 
Wilcox  et  al.  1987)  and  the  six  legged  ASV  (Adap¬ 
tive  Suspension  Vehicle)  (Spiessbach  et  al.  1987, 
Klein  et  al.  1987).  Further  details  of  the  the  large 
ALV  projects  are  given  in  appendbc  A. 

Neural  Networks  for  Obstacle  Avoidance  and 
Control 

It  was  the  work  on  the  NavLab  that  led  to  the  first 
real  use  of  neural  network  technology  for  the  con¬ 
trol  of  an  autonomous  vehicle,  ALVINN  {Autonomous 
Land  Vehicle  in  a  Neural  Network)  (Pomerleau  1988, 
Touretzky  and  Pomerlau  1989)  demonstrated  the 
possible  advantages  to  be  obtained  by  the  inclusion 
of  neural  network  processing  for  the  control  of  the  ve¬ 
hicle.  The  idea  behind  ALVINN  was  simple:  use  a 
neural  network  to  find  the  road  in  a  visual  and  laser 
ranger  images  (see  figure  1). 

An  MLP  (Rumelhart  et  al.  1986)  was  given  pixelated 
image  data  from  both  a  camera  and  a  laser  ranger  on 
board  the  van.  During  trmning  the  images  given  were 
those  that  would  be  obtained  if  the  van  were  leaving 
or  off  the  road  upon  which  it  was  supposed  to  drive. 
The  MLP  was  then  trained  to  provide  the  correct  con¬ 
trol  signal  (direction  of  motion)  to  bring  the  van  back 
onto  the  road.  Once  trained  the  configured  network, 
when  implemented  on  the  NavLab,  resulted  in  an  im¬ 
provement  by  a  factor  of  two  over  the  processing  speed 
achieved  previously  using  conventional  techniques. 

Given  that  the  ALVINN  network  used  whole  pixelated 
images  as  input  it  is  not  surprising  that  the  size  of 
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Figure  1:  Schematic  view  of  the  ALVINN  architecture. 

the  network  is  very  large  even  by  the  standards  used 
today..  Specifically  the  input  consisted  of  960  inputs 
from  a  30  X  32  camera  image  and  256  from  an  8  x 
32  image  obtained  from  the  laser  ranger.  The  final 
configuration  of  the  network  consisted  of: 

•  1216  input  units, 

•  29  hidden  units, 

•  46  output  units. 

The  output  units  encoded  a  linear  representation  of 
the  turning  radius  the  vehicle  should  take,  with  the 
tightest  radius  to  the  left  being  indicated  by  the  left¬ 
most  unit  the,  tightest  radius  to  the  right  the  right¬ 
most  unit  and  straight  on  by  the  central  unit. 

Obviously  to  tr^n  a  network  of  this  size  required  im¬ 
mense  amounts  of  both  data  and  computational  time. 
To  this  end,  since  it  was  difficult  to  gain  time  on  the 
NavLab,  data  for  training  was  simulated  using  actual 
data  gathered  from  the  vehicle  as  a  template.  Al¬ 
though  this  meant  that  real  data  was  not  used  to  train 
the  networks  it  had  the  advantage  that  data  could  be 
generated  that  simulated  the  vehicle  leaving  or  off  the 
road  with  out  having  to  place  the  vehicle  in  such  a 
predicament. 

Alternative  approaches  to  reducing  the  amount  of 
computation,  applied  in  research  elsewhere,  have  in¬ 
volved  the  use  of  processed  visual  data.  Here,  rather 
than  input  whole  pixelated  images,  the  image  may 
be  processed  first  using  computer  vision  methods 
which  are  able  to  extract  the  salient  features  in  the 
image:  regions  and  their  statistical  features  for  in¬ 
stance.  This  processed  data  may  then  be  fed  into 


Figure  2:  Road  image. 


Figure  3:  Segmentation  of  road  image. 

a  much  reduced  network  which  will,  therefore,  have 
much  smaller  overheads  in  terms  of  the  data  re¬ 
quired  and  the  time  required  to  train  the  network 
(Hutchinson  1990,  Carpenter  and  Grossberg  1987, 
Jamison  and  SchaikofT  1988).  An  example  of  this  can 
be  found  in  the  work  of  Wright  (Wright  1989).  Here, 
region  features  obtained  from  a  segmented  image  (see 
figures  2  &  3)  are  input  to  a  network  which  is  sub¬ 
sequently  trained  to  identify  and  label  the  road-like 
regions  in  the  image  (see  figure  4).  Having  identified 
the  road  and  obtained  its  position  relative  to  the  robot 
this  information  can  be  used  to  direct  the  vehicle. 
Such  systems  are  now  being  prepared  as  a  guidance 
mechanism  for  the  MARDI  ALV  (Bateman  1991). 

Other  techniques  use  more  structured  networks  which 
have  a  much  reduced  connec¬ 
tivity  (Fukushima  and  Miyake  1982)  which  facilitates 
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Figure  4:,  Segmentation  of  the  road  image  with  the  re¬ 
gions  labelled  by  the  neural  network  as  road  displayed 
in  black. 

training  on  complete  images. 

The  inclusion  of  neural  networks  to  carry  out  the  ob¬ 
ject/obstacle  detection  and  subsequent  motion  control 
has  been  further  developed  and  demonstrated  at  Fu¬ 
jitsu  (Watanabe  et  al.  1989)  and  MIT/University  of 
Boston  (Baloch  and  Waxman  1990).  Both  these  sys¬ 
tems,  which  are  described  further  in  the  third  case 
study,  use  a  hierarchy  of  networks  to  process  the  data. 

Neural  Networks  for  Path  Planning  &  Locali¬ 
sation 

The  initial  use  of  neural  networks  to  perform  a  path 
planning  function  are  exemplified  by  the  work  of  Jor¬ 
gensen..  His  work  addresses  the  problem  of  deter¬ 
mining  a  navigational  path  in  a  number  of  different 
room  environments  (Jorgensen  1987).  Here  a  sonar 
map  of  each  room  weis  obtained  by  recording,  after 
extensive  pre-processing,  eight  180®  sonar  scans  ob¬ 
tained  from  different  positions  in  each  room.  These 
recordings  where  stored  in  a  modified  Hopfield  net¬ 
work  (Hopfield  1982)  i.e.  the  neurons  could  adopt  a 
continuous  value  between  0  and  1.  A  rectangular  grid 
of  1024  square  cells  was  used  to  represent  each  room 
and  a  unique  neuron  from  the  Hopfield  network  was 
identified  with  an  individual  cell  of  the  grid.  The  level 
of  activity  of  that  neuron  indicated  the  sonar  activity 
at  that  point.  The  idea  of  dividing  the  robot’s  en¬ 
vironment  into  a  grid  is  not  a  new  one:  for  example 
the  idea  of  Certainty  Grids  had  been  used  earlier  by 
Thorpe  (Thorpe  1984)  and  Moravec  (Moravec  1986) 
at  Carnegie-Melon  University,  and  this  is  the  basis 
of  the  common  “free  space”  approach,  which  can  be 
used  for  obstacle  avoidance  is  described  in  the  first 
case  study., 

During  the  recall  phase,  the  robot  was  given  a  sin¬ 
gle  view  of  the  room  and  the  sonar  return  from  that 


point  used  to  prompt  the  Hopfield  network’s  associa¬ 
tive  memory  to  complete  the  interior  of  the  room. 
Having  obtained  a  map  of  the  room  the  path  could 
then  be  computed. 

This  method  is  limited  by  the  storage  capacity  of  the 
Hopfield  network  (Amit  et  al.  1985).  The  use  of  a  net¬ 
work  with  1024  neurons  meant  that  37  room  patterns 
could  be  stored  with  little  problem,  although  in  prac¬ 
tice  only  10  rooms  were  stored.  The  system  was  imple¬ 
mented  on  the  Oak  Ridge  National  Laboratory’s  mo¬ 
bile  robot  HERMIES  (Hostile  Environment  Robotic 
Machine  Intelligence  Experimentrd  Series)  with  the 
sonar  system  placed  around  the  body  of  the  vehicle. 

The  method,  however,  was  seriously  limited  by  the 
considerable  storage  required  for  the  synaptic  weights 
(there  are  n*  synapses  for  a  fully  connected  n-neuron 
network).  This  meant  that  the  computations  required 
for  this  associative  recall  required  nearly  3  hours  on 
the  robot’s  on-board  PC  AT.  Replacing  the  PC  host 
with  a  4  node  N  Cube  gave  a  sizable  speed  up  but  the 
resultant  speed  and  the  limited  recall  of  the  Hopfield 
network  limited  this  approach  (Jorgensen  1987).,  The 
idea  of  grid  localisation  using  a  neural  network  has 
been  adopted  else  where  (Tarassenko  et  al.  1991),  and 
this  work  forms  the  central  element  of  one  of  the  case 
studies  presented  here. 

Neural  Coutrollers 

Although  the  generality  of  the  subject  of  the  applica¬ 
tion  of  neural  networks  to  control  systems  falls  some¬ 
what  outside  the  scope  of  this  paper,  their  use  is 
important  and  considered  worthy  of  mention.  The 
use  of  neural  networks  for  the  control  of  a  vehicle’s 
motion  has  been  taken  up  by  many  workers  in  the 
field.  Possibly  one  of  the  most  well  known  is  that  of 
Widrow  with  “The  truck  backer  up”  (Widrow  1990). 
Other  work  has  used  various  strategies:  e.g.  net¬ 
works  have  been  used  to  provide  a  trainable  inverse 
model  of  a  system  based  on  the  input/output  observa¬ 
tions  of  the  plant,  Kawato  (Kawato  et  al.  1987),  Chen 
(Chen  and  Pao  1989).,  The  inverse  model  is  then  used 
to  generate  control  signals. 

Other  methods  use  two  networks,  one  to  model  the 
control  response  of  the  system  and  the  other  to  pro¬ 
duce  control  decisions.  A  great  deal  of  the  recent  de¬ 
velopments  in  this  area  within  Europe  have  been  re¬ 
ported  in  the  proceedings  of  the  lEE  conference  Con¬ 
trol  91.  A  large  proportion  of  this  work  has  concen¬ 
trated  upon  exploitation  of  the  non-linear  and  adap¬ 
tive  nature  of  a  neural  network  to  provide  the  adap¬ 
tive  feed-bEick  controller  that  is  central  to  some  non¬ 
linear  predictor  adaptive  control  systems,  see  figure 
5.  Such  systems  have  great  relevance  to  the  driver¬ 
less,  or  pilotless  vehicle.  Here,  without  a  human  con¬ 
troller,  the  vehicle  will  have  to  be  able  to  adapt  to 
both  slow  changes  in  the  characteristics  of  the  vehi¬ 
cles  performance,  e.g.  the  lightening  of  the  vehicle 
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Figure  5:  Predictor  Adaptive  Gain  Control  Architec¬ 
ture 

as  the  fuel  load  decreases,  and  more  importantly  sud¬ 
den  changes  e.g.  sudden  changes  in  terrain,  weight 
changes  caused  by  the  delivery  of  of  munitions,  or  as 
has  also  been  suggested  damage  to  control  surfaces  on 
aircraft  (White  and  Sofge  1991). 

A  good  example  of  the  use  of  neural  networks  in  adap¬ 
tive  control  can  be  found  in  the  papers  by  Brown  et 
al  and  Ince  et  al.  Here  a  recurrent  layered  network  is 
used  to  model  the  non-linear  response  of  the  vehicle 
to  the  control  signals  it  is  given  (Brown  et  al.  1991, 
Ince  et  al.  1991).  The  controller  network  runs  in  par¬ 
allel  to  the  predictor  model  and  adapts  as  the  vehicle’s 
response  changes  by  back  propagating  an  error  signal 
generated  by  differencing  the  output  of  the  reference 
model  and  the  actual  response  obtained  from  the  vehi¬ 
cle  that  the  network  is  modelling.  The  network  there¬ 
fore  acts  as  an  adaptive  gain  feed-back  controller  (see 
figure  5)  for  the  vehicle  which,  as  is  demonstrated  in 
Brown  et  al  and  Ince  et  al,  can  be  integrated  directly 
into  a  conventional  predictor  controller. 

Neural  Hardware 

The  brief  review  given  above  has  tried  to  give  an  idea 
of  the  breadth  of  work  on  the  application  of  neural  net¬ 
works  in  the  areas  in  sensing,  control,  path  planning, 
and  obstacle  avoidance.  The  more  recent  work  in  this 
area  hcis  started  to  demonstrate  the  advantages  to  be 
gained  from  the  use  of  these  systems.  However,  it  is 
the  contention  of  the  author  that  the  true  worth  of  us¬ 
ing  a  neural  network  can  not  be  realised  unless  the  net¬ 
work  that  has  been  designed  can  be  implemented  on 
appropriate  hardware  and  integrated  into  a  complete 
processing  system.  This  view  has  particular  merit  in 
the  subject  area  that  concerns  this  paper.  Here  any 
working  system  has  to  be  realised  in  hardwiue  that 


will  provide  the  appropriate  real  time  performance. 
Furthermore,  this  hardware  must  be  small  enough  and 
flexible  enough  to  fit  into  the  control  system  of  a  mo¬ 
bile  robot.  These  constraints  can  be  particularly  harsh 
in  military  environment^. 

It  may  be  argued  that  the  potentially  high  speed,  com¬ 
pact  nature  of  a  neural  network,  once  implemented  on 
the  appropriate  hardware  technology,  is  perhaps  the 
greatest  advantage  of  these  systems  over  and  above 
that  of  more  conventional  processing  techniques.  Ob¬ 
viously,  this  is  not  the  only  view  of  the  worth  of  neural 
networks,  but  it  is  a  view  that  is  of  great  importance 
in  the  field  of  mobile  robotics.  Although  neural  sys¬ 
tems  generally  do  not  easily  map  onto  conventional 
sequential  or  parallel  hardware  all  the  systems  that 
have  been  mentioned  so  far  in  this  paper  have  used 
some  form  of  on  board  non-neural  processor.  With 
the  advent  of  dedicated  hardware  (LeCun  et  al.  1990, 
Murray  et  al.  1990,  Holler  et  al.  1989)  a  further  re¬ 
duction  in  size  and  increase  in  performance  can  now 
be  anticipated.  Perhaps  the  first  example  of  the  use  of 
such  hardware  is  the  work  carried  out  by  the  Robotics 
Group  at  Oxford  University  (Tarassenko  et  al.  1991), 
a  description  of  which  forms  one  of  the  case  studies 
which  are  now  described. 

Case  Study  1:  Obstacle  Avoidance  Us¬ 
ing  an  Ultra  Sonic  Array 

The  use  of  data  from  ultra-sonic  arrays,  or 
for  that  matter  other  dense  range  dependent 
data,  for  obstacle  avoidance  is  quite  widespread 
(Buxton  and  Roberts  1990,  Jorgensen  1987).  The 
techniques  developed  to  provide  an  obstacle  avoidance 
function  using  this  type  of  data  divide  into  two. 

Configuration  space:  this  is  a  derivative  of  the  cer¬ 
tainty  grid  (Elfes  1987)  idea  that  was  explained 
earlier.  Here  a  dense  map  of  the  environment  is 
obtained  via  either  an  active  or  passive  sensor. 
This  map  is  then  used  to  compute  Tree  space 
corridors”,  which  allow  for  the  size  of  the  vehi¬ 
cle,  around  obstacles  present  in  the  environment. 
There  are  many  difficulties  with  this  method:  to 
generate  the  configuration  space  requires  a  large 
amount  of  processing,  the  method  is  not  body 
centred  and  therefore  the  view  of  the  environ¬ 
ment  may  not  be  consistent  with  the  view  seen 
from  the  vehicle  once  it  has  moved  to  a  differ¬ 
ent  position,  and  without  continuous  updating 
the  method  cannot  cope  with  moving  obstacles. 
The  method  is  characterised  by  an  explicit  re¬ 
calculation  of  the  robot’s  path  around  the  obsta¬ 
cle. 

Potential  field  methods:  this 

method  (Khatib  1986)  usually  relies  upon  mon- 

*  I  intend  to  leave  the  difficult  queition  of  verification  of  such 
neural  iyitema  until  the  concluiion 


itoring  the  signature  of  an  array  of  sensors  ca¬ 
pable  of  generating  range  dependent  data  in  a 
dense  pattern  around  the  vehicle.  Simply,  this 
method  uses  the  range  data  to  determining  the 
position  of  obstacles  relative  to  the  robot.  These 
obstacles  are  then  considered  to  have  a  repulsive 
potential  which  repels  the  robot  and  so  prevents 
the  vehicle  from  hitting  the  obstacle..  Unlike  the 
previous  method  this  technique  is  body  centred 
and  since  the  method  uses  data  that  is  continu¬ 
ally  updated  is  able  to  deal  with  moving  obsta¬ 
cles.  Here,  the  robot  moves  “relatively”  with  an 
Implicit  re-calculation  of  the  path. 
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NEURAL  NET 


The  work  described  here  is  the  result  of  an  investiga¬ 
tion  carried  out  by  the  German  company  IBP  Pietzsch 
for  the  ANNIE  project  into  the  use  of  neural  network 
architectures  for  reactive  obstacle  avoidtince.  Ultra¬ 
sonic  signatures  are  processed  to  produce  a  control 
signal  necessary  to  ensure  that  the  vehicle  avoids  ob¬ 
stacles  placed  in  its  path.  The  method  that  is  devel¬ 
oped  here  is  somewhat  similar  to  the  potential  held 
method  described  briefly  above.  Although  the  results 
of  this  investigation  are  obtained  via  simulation  and 
do  not  use  real  data  it  is  hoped  that  they  give  a  graphic 
description  of  the  potential  use  of  neural  networks  for 
sensor/motor  integration.  A  more  detailed  description 
follows. 

Network  Implementation 

Briefly  the  simulation  used  is  composed  of: 

•  a  mobile  robot  that  is  equipped  with  9  ultra-sonic 
sensors,  as  is  shown  in  flgure  6.  These  9  sensors 
are  arranged  in  groups:  4  pointing  forward,  2  on 
each  side  of  the  robot  pointing  to  the  left  and  to 
the  right,  and  one  sensor  pointing  to  the  rear, 

•  The  environment  in  which  the  vehicle  moves  con¬ 
sists  of  a  room  containing  obstacles  of  differing 
shape  and  complexity  (see  flgure  7). 

The  robot  is  allowed  to  move  through  the  environment 
using  8  possible  motions: 

1.  stop, 

2.  fast  forward, 

3.  slow  forward, 

4.  turn  right  by  45®, 

5.  turn  left  by  45®, 


•  learrfno 


!  i  1 


Figure  6:  Control  Architecture  for  Simulated  Vehicle 
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6.  turn  right  by  90®, 

7.  turn  left  by  90®, 

8.  slow  backwards. 


Figure  7:  The  5  primitives  used  to  construct  obstacles 
for  simulated  environment 


As  the  control  architecture  diagram  {figure  6)  suggests 
the  output  from  the  ultra-sonic  sensors  were  exten¬ 
sively  preprocessed.  The  preprocessing  range  gated 
the  sensor  output,  before  it  was  put  into  the  neural 
network,  into  15  range  values  that  were  spaced  loga¬ 
rithmically.  To  ensure  that  this  range  data  was  pre¬ 
sented  to  the  network  in  a  robust  manner  the  range 
values  from  each  of  the  sensors  were  encoded  in  a  4  bit 
coding^  designed  such  that  the  codes  for  neighbouring 
range  gates  were  separated  by  a  small  Hamming  dis¬ 
tance.  Thus  similar  codes  would  be  obtained  for  range 
values  that  just  fell  either  side  of  a  range  boundary. 

The  i  bit  coding  from  each  of  the  9  sensors  gave  a  36 
bit  binary  input  to  the  network  that  was  used,  which 
was  a  3  layer  MLP.  The  output  layer  of  this  MLP  con¬ 
sisted  of  3  units.  These  encoded  the  8  possible  control 
instructions  that  the  robot  should  receive.  Again,  as 
in  the  input  coding  it  was  ensured  that  this  coding 
was  robust  to  small  fluctuations  and  so  similar  mo¬ 
tions  were  given  codings  separated  by  small  Hamming 
distances. 

To  train  the  network  the  vehicle  was  placed  repeat¬ 
edly  in  close  proximity  to  10  typical  rectilinear  obsta¬ 
cles  such  as:  corners,  corridors,  walls,  and  walls  with 
openings  (see  figure  8).  To  ensure  that  the  vehicle  is 
able  to  meet  all  the  situations  that  it  may  find  itself 
in,  the  position  and  orientation  of  the  vehicle  was  also 
varied.  This  ensured  that  the  configuration  generated 
on  the  network  during  training  was  as  general  as  pos¬ 
sible.  In  presenting  the  vehicle  to  the  various  obstacles 
the  sensor  signals  from  the  9  sensors  were  generated 
and  this  data  together  with  a  motor  response  signal 
given  by  an  operator  was  given  to  the  network  to  allow 
it  to  train. 

As  with  all  MLP  simulations  the  precise  construction 
of  the  network  is  not  clear  at  the  outset  and  empir¬ 
ical  data  has  to  be  gathered  to  determine  the  num¬ 
ber  of  hidden  units  and  to  set  the  back  propagation 
(Rumelhart  et  al.  1988)  parameters.  The  final  config¬ 
uration  obtained  from  these  experiments  was  an  MLP 
with: 


•  36  input, 

•  8  hidden  (the  only  unknown  variable), 

•  3  output. 

For  this  straight  forward  problem  satisfactory  conver¬ 
gence  was  obtained  after  the  repeated  presentation  of 
10  obstacles  as  shown  m  figure  8.  The  slow  nature  of  ^‘8^" 
the  MLP  error  back-propagation  required  over  20,000 
presentation  of  the  10  obstacles. 

The  slow  learning  rate  obtained  has  since  been  greatly 
improved  by  the  use  of  direct  analogue  input  from  the 
sensors  themselves.  Here  a  slightly  different  architec¬ 
ture  has  been  used. 


'  Obviously  this  assumes  that  a  perfect  return  signature  is  ob¬ 
tainable  from  these  sensors  which  is  usually  not  possible. 


•  10  inputs,  9  returns  from  the  ultra-sonnic  sen¬ 
sors  which  are  inverted,  together  with  the  current 
speed  of  the  vehicle, 

•  3  hidden  units, 

•  2  output  units,  indicating  the  change  in  the  vehi¬ 
cles  speed  and  angle  of  turn. 

Although  the  performance  of  this  network  is  not  radi¬ 
cally  different  from  the  previous  design  the  use  of  anar 
logue  inputs  allows  the  network  to  be  much  smaller. 
The  small  size  of  the  network  allows  the  training  to 
be  accomplished  much  more  easily. 

Discussion 

Once  trained  it  was  found  the  the  network  was  able  to 
negotiate  perfectly  the  obstacles  it  was  given  to  train 
upon.  Testing  the  network  on  a  set  of  rectilinear  ob¬ 
stacles  upon  which  it  had  not  been  trained  indicated 
that  the  network  could  generalise  to  obstacles  with 
which  it  was  not  familiar.  The  network  was  able  to 
negotiate  the  new  obstacles  only  failing  to  avoid  these 
in  a  small  percentage  (1%)  of  the  cases.  Furthermore, 
this  performance  could  be  increased  by  retraining  the 
network  on  those  cases  which  it  found  difficult.  Per¬ 
haps  surprisingly  it  was  found  that  the  performance 
of  the  neural  controller  depended  heavily  upon  the 
identity  of  the  operator  who  was  used  to  give  the  di¬ 
rection  of  motion  of  the  vehicle  for  each  training  situa¬ 
tion.  This  highlights  an  important  point  regarding  the 
adaptive  nature  of  a  neural  network,  in  that  the  final 
configuration  of  the  network  can  be  heavily  dependent 
upon  not  only  the  nature  of  the  training  data  used  to 
configure  it  but  also  the  way  in  which  that  data  is 
presented. 

The  dependency  of  the  final  configuration  of  a  neu¬ 
ral  network  after  training  with  respect  to  these  fac¬ 
tors  obviously  has  great  bearing  upon  the  variability 
of  such  systems.  This  variability  can  be  reduced  if 
the  data  used  to  train  the  network  and  the  way  that 
data  is  presented  is  tightly  specified.  This  point  and 
others  related  to  the  verification  of  these  systems  arc 
discussed  further  in  the  last  section. 

Although  limited  in  its  scope  it  is  the  intention  of  this 
case  study  to  demonstrate  how  a  neural  network  can 
be  used  to  perform  a  sensor/motor  association.  The 
use  of  a  neural  network  to  produce  a  reactive  motor 
response  to  a  given  stimulus  has  many  advantages  over 
the  more  conventional  approaches  that  have  been  de¬ 
scribed.  Apart  from  the  speed  and  the  compact  nature 
of  these  devices  once  implemented  in  VLSI  silicon  the 
highly  parallel  nature  of  these  systems  (which  allows 
data  from  many  processors  to  be  processed  simultane¬ 
ously)  coupled  with  their  adaptability  (which  allows 
data  to  be  processed  without  the  requirement  for  di¬ 
rect  calibration  since  this  can  naturally  be  configured 
into  the  network  during  training)  makes  neural  sys¬ 
tems  very  useful  for  reactive  control.  The  use  of  neural 


networks  to  perform  this  associatir  u  has  been  carried 
out  successfully  in  a  number  of  ot.ier  areas  related  to 
robot  control  (Waxman  et  al.  198h,  Peterson  1991), 

Case  Study  2:  Localisation  from  Off 
Board  Sensors 

The  work  presented  here  represents  part  of  that  car¬ 
ried  out  by  British  Aerospace’s  corporate  research  cen¬ 
tre  The  Sowerby  Research  Centre  for  the  ESPRIT  II 
project  ANNIE.  The  investigation  is  concerned  with 
the  localisation  of  a  robotic  vehicle.  However,  rather 
than  performing  this  localisation  using  sensors  placed 
on  the  robot  the  investigation  is  concerned  with  the 
somewhat  different  problem  of  localis’.ag  the  robot  us¬ 
ing  a  surveillance  system  separate  from  it.  It  is  as¬ 
sumed,  not  unreasonably,  that  the  burveillancu  sys¬ 
tem  is  able  to  provide  both  the  bearing  and  range  of 
the  objects  it  detects  but  is  not  able  to  consequently 
identify  the  object.:  Localisation  of  a  known  vehicle 
is,  therefore,  not  possible  if  there  are  other  targets 
present  without  the  use  of  prior  knowledge  such  as 
the  robot’s  approximate  position  or  the  identity  of  the 
objects. 

In  general,  military  systems  overcome  this  problem  by 
using  say  IFF  techniques  or  allowing  the  vehicle  in  the 
field  to  determine  its  position  against  a  kno  n  frame 
of  reference,  using  GPS  for  instance,  and  communi¬ 
cating  this  back  to  the  surveillance  system.  However, 
the  use  of  either  of  these  systems  is  not  always  desir¬ 
able  or  possible.  An  alternative  is  to  allow  the  vehi¬ 
cle  to  simply  communicate  to  the  surveillance  system 
the  present  trajectory  and  then  through  a  process  of 
“data  fusion”  determine  which  surveillance  track  best 
matches  the  trajectory  and  so  identify  and  localise 
the  vehicle.  This  last  alternative  has  the  advantage  in 
that  it  does  not  rely  upon  an  external  system  such  as 
a  satellite,  nor  would  it  be  easy  to  jam  or  suffer  from 
external  interference.  To  perform  this  “data  fusion” 
however,  which  requires  the  data  received  from  a  ve¬ 
hicle  to  be  correlated  with  all  the  objects  detected  by 
the  surveillance  system,  may  require  some  very  inten¬ 
sive  computing.  Furthermore,  the  correlation  between 
the  signals  may  not  be  obvious  and  could  well  be  non¬ 
linear. 

This  investigation  looks  at  the  possibility  of  using  a 
neural  network  to  perform  this  correlation  in  the  hope 
that  the  high  bandwidth  and  non-linear  adaptability 
of  neural  networks  will  be  of  advantage,  and  demon¬ 
strates  this  in  a  real  laboratory  environment.  To  carry 
this  out  the  investigation  exploited  a  distributed  real 
time  surveillance  system  that  had  already  been  built 
for  the  ESPRIT  I  project  SKIDS®  and  constructed  in 
a  10m  X  10m  room  in  one  of  the  laboratories  at  The 
Sowerby  Research  Centre.  This  environment  together 
with  the  mobile  robot  that  was  used  for  the  investiga¬ 
tion  are  described  further  in  the  following  section. 

*  Signal  and  Knowledge  Integration  with  Deciiionol  control  for 
multi-teniory  Syttemi 
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Figure  9:  Schematic  View  of  SKIDS  Environment 

The  Environment 

The  SKIDS  surveillance,  or  tracking,  system  moni¬ 
tors  the  room  through  the  use  of  four  monochrome 
CCD  cameras,  one  mounted  at  each  of  the  four  cor¬ 
ners  of  the  room  (see  figure  9).-  The  images  of  the 
room  form  reference  frames  which  are  stored  by  the 
SKIDS  meurhine  and  constantly  updated.  An  event 
within  the  room  is  detected  by  differencing  the  current 
camera  image  and  the  reference  frame.  The  result¬ 
ing  differenced  image  is  then  thresholded  and  grouped 
into  regions  corresponding  to  the  moving  objects  in 
the  room.  From  the  least  enclosing  rectangle,  which 
is  computed  around  each  region,  the  position  of  the 
event  can  be  determined  by  projecting  the  bottom 
of  the  rectangle  onto  the  floor.  The  image  process¬ 
ing  required  to  perform  the  segmentation  is  compute^ 
tionally  very  demanding.  A  parallel  processor  and  a 
specialised  image  processing  engine  are  therefore  em¬ 
ployed  within  the  SKIDS  system  to  provide  sufficient 
computational  power.  A  Datacube  pipeline  image 
processor  is  used  to  acquire  and  scale  images  from  the 
CCD  cameras  and  a  Transputer  array  provides  the 
parallel  processing  support  required  for  the  remain¬ 
ing  image  processing  tasks.  In  this  way  the  SKIDS 
machine  supports  the  real  time  detection,  position¬ 
ing,  and  tracking  of  all  events  within  the  room  (the 
system  typically  operates  at  a  sampling  rate  of  about 
6Hz). 

Apart  from  objects  such  as  humans  and  other  vehicles 
the  environment  also  contains  a  mobile  robot.  It  is  in¬ 
tended  that  the  robot  should  be  under  the  control  of 
the  SKIDS  machine.  The  function  of  the  robot  is  to 
carry  a  sensor  suite  to  “remote”  parts  of  the  environ¬ 
ment,  for  example  to  perform  a  localised  inspection 
task.  However,  the  localisation  of  the  robot  from  the 
SKIDS  track  data  alone  is  not  possible  without  first 
identifying  which  event  corresponds  to  the  vehicle.. 


Figure  10:  Large  Multi-input  network 

The  mobile  robot  used  for  the  purposes  of  this  inves¬ 
tigation  is  the  Robosoft  Robuter.  The  Robuter  has 
two  drive  wheels  mounted  at  the  back  with  two  cas¬ 
tors  at  i.he  f.'unt.  Optical  shaft  encoders  are  attached 
to  the  two  c'ive  whee'js  in  order  to  monitor  and  con¬ 
stantly  feedback  the  motion  of  the  robot  wheels.  The 
Robuter  can  be  controlled  remotely  from  a  Sun4  work¬ 
station  via  an  RS232  radio  link  which  sends  movement 
and  measurement  commands  to  the  on-board  operat¬ 
ing  system.  This  operating  system  is  based  around  a 
68020  microprocessor  and  supports  movement  control 
and  sampling  of  the  odometry  obtuned  from  shaft  en¬ 
coder®  on  the  wheels. 

Thii  fctudy  describes  how  the  “data  fusion"  between 
th>'  event  data  produced  by  the  SKIDS  machine  and 
the  robot  trajectory  as  given  by  its  odometry  can  be 
carried  out  by  a  neural  network  to  perform  the  posi¬ 
tional  and  orientational  independent  identification  of 
the  SKIDS  event  that  corresponds  to  the  robot  which 
can  then  be  localised. 

Network  Implementation 

Two  successful  network  implementations  have  been 
produced.  Both  are  based  upon  the  MLP  and  use 
error-back  propagation  (Rumelhart  et  al.  1986).  The 
first  network  consists  of  a  large  input  layer.  This,  as 
can  be  seen  from  figure  10,  comprises  inputs  from  each 
SKIDS  track  together  with  an  equivalent  signature  ob¬ 
tained  from  the  robot  odometry.  This  investigation 
tried  several  different  signatures  based  upon  spatial 
or  angular  decompositions  of  the  SKIDS  event  and 
robot  velocities.  The  best  performance  was  obtsuned 
from  the  following  two  reference  frame  mdependent 
signatures: 

•  object  speed  and  angular  velocity, 

•  object  speed  and  acceleration. 
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The  fact  that  these  signatures  are  found  to  be  the  best 
is  not  so  surprising.  The  velocity  of  the  vehicle  is  a 
relative  measurement  which  can  be  obtained  directly 
from  the  vehicle  odometry  with  out  the  problems  in¬ 
duced  by  systematic  errors  that  would  effect  positional 
measurements.  The  use  of  velocity,  therefore,  is  po¬ 
sitional  independent.  Furthermore,  orientation  inde¬ 
pendence  can  be  obtained  if  scalar,  rather  than  vector, 
measures  such  as  speed  are  used. 

The  output  of  the  network  used  n  units  where  n  repre¬ 
sented  the  number  of  SKIDS  tracks  that  the  network 
was  designed  for.  For  the  case  shown  in  the  figure  10 
Ti  =  3.  This  allowed  a  1  from  n  coding  to  be  used  to 
encode  the  output.  Here  a  high  value  at  the  n***  output 
indicates  a  robot  signature  match  with  the  n*^  SKIDS 
input.  It  has  been  shown  by  MacKay  (MacKay  1987) 
among  others  that  this  output  coding,  provided  the 
network  has  been  trained  in  the  correct  manner,  al¬ 
lows  the  values  given  at  the  output  to  be  interpreted 
as  a  confidence  of  the  n***  interpretation.  Since  in  cer¬ 
tain  circumstances  one  or  more  SKIDS  events  are  not 
distinguishable  from  the  real  event  this  coding  allowed 
the  network  to  give  a  result  which  reflects  the  level  of 
confusion. 

To  prevent  any  bias  being  introduced  during  train¬ 
ing  the  position  of  the  robot  track  in  the  training 
data  was  randomised.  After  some  experimentation  the 
most  suitable  network  configuration  for  identifying  the 
robot  from  3  SKIDS  events  was  found  to  be: 

•  8  input  units, 

•  8  hidden  units, 

•  3  output  units. 

Upon  testing  of  the  network  a  ~  90  %  success  rate  on 
data  different  from  that  used  to  train  it  was  found. 
Furthermore,  although  as  the  percentage  success  rate 
suggests  in  some  cases  the  network  was  unable  to  iden¬ 
tify  to  which  track  the  robot  odometry  belonged,  this 
was  usually  because  the  network  was  unable  to  label 
a  track  as  coming  from  the  robot  with  enough  confi¬ 
dence,  as  reflected  by  the  value  given  at  the  output  of 
the  network,  for  unambiguous  recognition.  Since  the 
majority  of  such  cases  resulted  from  situations  where 
the  network  indicated  that  there  were  two  robot  can¬ 
didates,  one  of  which  was  the  correct  solution,  the 
level  of  the  miss-classification  was  much  smaller  than 
suggested  by  the  above  result. 

The  disadvantage  in  using  the  large  multi-input  net¬ 
work  that  is  described  here  is  that  the  system  does  not 
have  any  inherent  ability  to  scale.  To  change  the  sys¬ 
tem  from  differentiating  between  not  three  but  four 
or  five  SKIDS  tracks  requires  the  network  to  be  ex¬ 
tended  and  completely  retrained.  Obviously  this  sug¬ 
gests  that  although  the  large  network  may  give  desir¬ 
able  results  its  lack  of  flexibility  probably  precludes 
its  use  in  a  real  system. 


An  alternative  to  the  use  of  a  single  large  network  is 
to  use  a  hybrid  system  of  several  small  networks  each 
trained  to  determine  if  a  single  SKIDS  event  matches 
the  robot’s  odometry.  This  has  the  advantage  that 
the  individual  networks  that  comprise  such  a  system 
can  be  trained  separately.  Furthermore,  if  this  trrun- 
ing  is  carried  out  appropriately  then  it  is  only  nec¬ 
essary  to  train  a  single  network  and  allow  the  other 
networks  in  the  hybrid  system  to  be  “carbon  copies” 
of  the  first.  Taking  this  idea  a  small  MLP  was  trained 
using  error  back-propagation  (Rumelhart  et  al.  1986) 
with  the  same  velocity  signature  data  that  was  found 
to  be  effective  with  the  large  multi-input  network. 

These  small  networks  were  simple  in  construction  as  is 
illustrated  in  figure  11.  The  mput  comprised  4  units 
which  allowed  the  odometry  signature  of  the  robot 
and  a  single  SKIDS  track  to  be  input  to  the  network. 
The  output,  which  consisted  of  just  1  unit,  signified 
whether  the  SKIDS  track  given  to  the  network  be¬ 
longed  to  the  robot  or  not.  The  typical  performance 
of  the  network  with  3  hidden  units  was  found  to  be 
marginly  lower  (85-90%)  that  obtained  from  the  large 
multi-input  network. 

The  hybrid  design  has  many  advantages.  As  has  al¬ 
ready  been  mentioned  such  a  system  scales  in  a  much 
more  sensible  way  than  the  large  multi-input  network 
(as  the  network  increases  in  size  the  more  small  net¬ 
works  are  used).  If  the  networks  were  implemented 
in  parallel,  the  computational  burden  imposed  by  this 
hybrid  system  increases  approximately  linearly  with 
the  number  of  networks  and  therefore  SKIDS  events. 
Furthermore,  since  it  is  possible  that  only  one  small 
network  may  have  to  be  trained  this  greatly  reduces 
the  amount  of  training  and  therefore  data  required  to 
configure  the  whole  system. 

A  significant  disadvantage  of  this  system,  however,  is 
that  when  the  individual  small  networks  are  trained, 
unlike  the  large  multi-input  networks,  they  are  not 
aware  of  the  presence  of  other  events  that  may  have 
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been  detected.  Not  surprisingly  therefore  the  output 
from  the  hybrid  system  may  not  be  unicue.  If  the  in¬ 
put  data  is  confused  then  several  of  the  networks  may 
match  the  robot  odometry  to  the  particular  SKIDS 
event  that  they  were  given.  This  explains  why  slightly 
lower  results  for  the  hybrid  system  in  comparison  with 
the  multi-input  system  were  obtained.  This  problem 
can  be  overcome  by  introducing  a  “winner  takes  all" 
mechanism  (Lippmann  and  Huang  1987)  on  the  out¬ 
puts  of  the  networks. 

An  alternative  to  the  “winner  takes  all”  mechanism 
is  to  use  the  temporal  continuity  of  the  events  gen¬ 
erated  by  the  tracker..  This  exploits  the  fact  that 
there  is  a  significant  probability  that  the  identity  of 
an  event  will  remain  the  same  from  one  time  frame 
to  another.  Obviously  this  probability  is  affected  by 
the  amount  of  noise  in  the  system,  and  presence  and 
number  of  other  events  in  the  room  with  which  the 
event  could  become  confused.  This  temporal  con¬ 
tinuity  can  be  exploited  by  allowing  lateral  inhibi¬ 
tion  (Carpenter  and  Grossberg  1987,  Kohonen  1984) 
between  the  outputs  of  the  hybrid  system.  Here  the 
weighted  links  between  the  outputs  adapt  with  time 
such  that  an  output  that  has  had  a  high  value  for  sev¬ 
eral  time  steps  is  enhanced  whilst  the  others  are  di¬ 
minished.  This  serves  to  dampen  fluctuations  in  the 
output  of  the  hybrid  system  such  that  in  the  event 
that  an  output  is  ambiguous  the  network  still  gives  a 
definite  answer. 

Discussion 

What  has  been  demonstrated  here  is  the  use  of  a  neu¬ 
ral  network,  or  networks,  to  perform  the  correlation 
central  for  the  “data  fusion"  required  for  the  identi¬ 
fication  of  a  known  vehicle  detected  by  a  surveillance 
system.  The  localisation  that  results  from  this  process 
is  relative  to  the  co-ordinate  frame  of  the  surveillance 
system  which  may  be  moving  or  static.  Further,  it  has 
been  demonstrated  that  this  surveillance  system  can 
be  distributed  and  so  dispersed  though-out  the  region 
of  interest  which  gives  a  increase  in  the  base  line  of  the 
system  allowing  better  positioning  to  be  determined. 

Although  this  system,  like  other  identification  meth¬ 
ods,  requires  communication  between  the  tracker  and 
vehicle  or  vehicles,  since  the  system  is  distributed,  line 
of  sight  communication  can  be  carried  out  to  a  variety 
of  points,  reducing  the  risk  of  interference  or  revealing 
the  position  of  the  vehicle. 

This  system  also  presents  the  alternative  possibility 
where  the  surveillance  system  is  distributed  across  the 
vehicles  themselves  to  form  a  dUtribuied  robotic  sys¬ 
tem.  Such  a  system  would  consist  of  a  large  number 
of  very  simple  mobile  vehicles  which  are  able  to  com¬ 
municate  with  each  other.  The  essential  element  of 
this  distributed  system  is  that,  like  a  colony  of  ants, 
although  the  individual  elements  have  a  low  degree  of 


complexity  the  emergent  behaviour  of  the  whole  sys¬ 
tem  (if  configured  correctly)  may  be  extremely  com¬ 
plex.  To  allow  the  elements  (vehicles)  of  this  system 
to  move  together  and  therefore  act  as  whole  it  is  nec¬ 
essary  for  each  vehicle  know  the  relative  position  of 
the  others.  The  positions  of  differing  objects  relative 
to  a  particular  vehicle  can  be  found  via  a  simple  pas¬ 
sive  or  active  tracking  system.  However,  as  has  been 
shown,  it  is  necessary  to  identify  them  first  before  a 
particular  vehicle  may  be  localised.  This,  as  in  the 
case  study,  can  be  carried  out  by  matching  the  sig¬ 
nature  of  the  tracked  objects  with  their  odometry  as 
conununicated.  The  relative  location  of  each  identi¬ 
fied  vehicle  can  then  be  determined  in  relative  to  the 
whole  group. 

In  the  military  field  a  robotic  system  such  as  this  has 
several  desirable  qualities.  The  system  is  constructed 
of  many  simple,  and  hopefully,  therefore  cheap,  dis¬ 
posable  elements.  Since  the  system  does  not  depend 
upon  any  single  element  the  system  should  be  able  to 
withstand  a  high  level  of  attrition  without  a  catas¬ 
trophic  effect  upon  the  whole  system’s  performance. 
A  variety  of  possible  applications  come  to  mind  from 
recognition  and  terrain  mapping  to  the  autonomous 
convoying  of  logistic  support  around  a  battle  field. 

Case  Study  3:  Integrated  Localisation 
and  Path  Planning 

The  use  of  neural  networks  for  the  control 
of  a  mobile  robot  has  already  been  graphi¬ 
cally  demonstrated  by  Waxman  and  his  co-workers 
(BiJoch  and  Waxman  1990,  Waxman  et  al.  1988)  to¬ 
gether  with,  for  example,  the  work  carried  out  at  the 
Fujitsu  laboratories  (Watanabe  et  al.  1989).  In  both 
cases  a  hierarchal  architecture  of  neural  networks  have 
been  designed  to  perform  the  differing  functions  re¬ 
quired  of  the  respective  systems.  In  both  these  sys¬ 
tems,  however,  a  large  proportion,  if  not  all,  of  the 
neural  processing  is  carried  out  by  a  static  worksta¬ 
tion  communicating  to  the  robot  via  a  radio  link. 

The  system  built  by  Waxman  and  his  co-workers 
for  instance,  MAVIN  (Mobile  Adaptive  Visual  Navi¬ 
gation)  (Baloch  and  Waxman  1990),  uses  a  large  hi¬ 
erarchy  of  networks  to  perform  the  processing  re¬ 
quired  for  the  robot’s  cameras  saccade  and  gaze 
control  (simple  ADALINEs  (Widrow  and  Hoff  1960) 
are  used  here)  through  to  object  classification  (ART 
I  (Carpenter  and  Grossberg  1987)  networks  are  used 
extensively  here).  All  the  networks  used  in  this 
demonstration  were  simulated  on  a  SUN  3/60  which 
communicated  with  the  robot  via  a  radio  link.  The 
image  processmg  required  was  carried  out  on  an  AS- 
PBX  PIPE  1/800  video  rate  computer. 

The  network  implementation  for  the  Fujitsu  robots  is 
some  what  different  from  MAVIN.  Here  the  networks 
that  controlled  the  Fujitsu  robots  were  trwned  and 
adapted  off  the  robot  on  a  workstation;  the  truned 


7-14 


Figure  12:  Control  Architecture  for  the  Oxford  Robot 

networks  are  then  down  loaded  onto  the  robots’  on¬ 
board  processor,.  Although  this  allows  the  robot  to  be 
self  contained  the  r^etworks  cannot  be  adapted  on  the 
robot  itself. 

It  is  the  intention  of  this  case  study  to  highlight 
the  potential  further  and  substantid  advances  to  be 
gained  through  the  use  of  neural  networks  imple¬ 
mented  on  dedicated  VLSI  hardware.  The  work  de¬ 
scribed  is  that  still  being  undertaken  by  the  Robotics 
Group  at  the  University  of  Oxford  under  Dr.,  Li¬ 
onel  Tarassenko  and  in  part’'  supported  by  RSRE® 
Malvern.  The  thrust  of  this  work  is  to  build  a  low- 
cost,  real  time  mobile  navigation  system  based  upon 
a  set  of  VLSI  neural  network  navigational  modules. 
These  modules  are  based  upon  the  two  functional  re¬ 
quirements  that  have  been  described  earlier.  This  case 
study  gives  an  overview  of  the  path  planning  and  lo¬ 
calisation  modules  together  with  a  description  of  how 
these  two  modules  can  be  integrated  together.  Obvi¬ 
ously  the  localisation  module  directly  impinges  upon 
the  path  planning  module;  a  schematic  diagram  of  the 
robot  control  architecture  is  given  in  figure  12.  Both 
the  path  planning  and  localisation  systems  operate  on 
a  certainty  grid  idea  which  has  been  briefly  described 
earlier.. 

Localisation 
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Figure  13:  Diagrammatic  view  of  the  Oxford  Labora¬ 
tory  Robot  Environment 

was  a  phase  sensitive  near  infra-red  device  that  was 
developed  by  the  research  group  with  this  purpose  in 
mind.  This  device  is  capable  of  resolving  phase  shifts 
of  0.1®  over  a  50  dB  range.  This,  as  can  be  seen  from 
figure  14,  allows  a  very  detailed  range  map  tj  be  pro¬ 
duced.  The  original  work  carried  out  by  Oxford  in 
this  area  concentrated  on  the  use  of  an  ultra-sonic  sen¬ 
sor.  This  required  extensive  preprocessing  before  the 
signatures  could  be  input  to  the  network.  The  high 
resolution  infra-red  scanner  mitigates  this  problem. 

Given  a  set  of  learned  signatures  the  grid  system  can 
be  used  to  compute  the  robot’s  approximate  position. 
This  may  be  determined  by  comparing  the  current  sig¬ 
nature  X  with  one  of  the  k  learned  patterns  U{  which 
correspond  to  the  signature  of  the  range  finder  at  each 
of  the  k  grid  points.  By  finding  the  closest  match  be¬ 
tween  X  and  one  of  the  Ui’s  the  position  of  the  nearest 
grid  point  to  the  present  position  of  the  robot  can  be 
obtained.  If  a  Euclidean  metric  is  used  to  determine 
the  difference  between  x  and  all  tq’s  then  the  closest 
match  may  be  obtuned  for  that  Uj  were: 


The  localisation  system  on  this  robot  relies  on  the 
certainty  grid  idea  desc;  ibed  above.  Here  a  28-point 
grid  was  used  to  map  the  robot’s  envi’-onment,  see 
figure  13.  In  a  similar  way  to  that  used  by  Jor¬ 
gensen  (Jorgensen  1987)  the  environmental  character¬ 
istics  were  learned  by  recording  the  360®  signature  ob¬ 
tained  from  a  time-of-flight  optical  range  finder.  This 

^  The  reeiitive  grid  path  planning  syitem 
*  Royal  Signal  and  Radar  Eitabliihment 


=  iNP-siifx  +  IKiP  (1) 

is  a  minimum.  Given  that  x  is  constant  with  respect 
to  i  using  equation  1  a  linear  discriminant  function 
g(x)  can  be  written  were: 

gi(x)  =  uf  X  +  Uio,  (2) 

and  w,o  =  -l/2||u|]?.  The  discriminant  function  thus 
uses  the  crost  correlation  of  the  input  with  the  stored 
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Figure  14:  Range  map  of  the  Oxford  environment 

patterns,  the  maximum  value  of  which  gives  the  pat¬ 
tern  u  with  which  the  signature  x  most  closely  corre¬ 
lates. 

If  equation  2  is  rewritten  by  identifying  tti  =  {T^} 
and  X  =  {Vj  }  as: 

~  (3) 

where  n  is  the  number  of  range  points  obtained  in 
each  scan,  the  patterns  recorded  at  the  grid  points 
can  be  identified  with  neural  weights  Tij.  The  cross 
correlation  central  to  the  discriminant  function  can  be 
written  as  the  vector  matrix  multiplication,  E"T,,  V5, 
that  is  central  to  a  neural  network. 

The  advantage  of  the  formulation  given  in  equation  3 
is  that  it  provides  a  natural  representation  that  when 
implemented  on  a  dedicated  neural  device  allows  the 
simultaneous  comparison  of  ail  range  points  with  the 
k  learned  patterns  u,-.  The  maximum  of  the  discrim¬ 
inant  function  g(x)  can  then  be  picked  out  using  by 
using  a  “winner  takes  all”  function  on  the  network. 

The  advantage  of  such  a  system  of  course  depends 
upon  it  implementation.  As  has  previously  been  men¬ 
tioned,  for  any  implementation  to  be  of  advantage 
both  its  speed  and  size  are  important  characteristics. 
The  localisation  algorithm  that  has  been  described 
here  can  be  buik  quite  simply  into  a  small  “win¬ 
ner  takes  all”  network.  Since  both  the  input  vectors 
Vj  and  the  network  weights  T,j  are  analogue  this  al¬ 
lows  the  implementation  to  be  mapped  easily  into  the 
pulse-stream  VLSI  analogue  neural  devices  that  have 
been  designed  by  the  Department  of  Electrical  Engi¬ 
neering  at  the  University  of  Edinburgh  in  conjunc¬ 
tion  with  the  Robotics  Group  (Murray  et  al.  1988, 
Murray  et  al.  1990)  which  provide  real  time  capabil¬ 
ity.  The  speed  of  the  localisation  system  is  simply 
limited  by  the  traverse  time  of  the  infra-red  scanner 
which  is  approximately  a  second.  Further,  the  com¬ 
pact  size  and  analogue  nature  of  the  pulse  stream  de¬ 
vice  allows  the  processing  to  take  place  compactly  on 


Figure  15:  Resistive  grid  map  of  the  robot’s  environ¬ 
ment;  high  resistances  (black  areas)  indicate  obstacles. 
The  optimal  path  between  P  and  G  is  indicated  by  the 
black  line  joining  these  points. 

the  sensor  where  as  a  more  conventional  implementa^ 
tion  say  with  Transputers  would  require  many  more 
devices  with  a  larger  resultant  demand  for  power. 

Path  Planning 

The  path  planning  module  in  the  Oxford  robot  adopts 
a  resistive  grid  approach  to  this  problem.  The  use 
of  resistive  grids  was  suggested  in  a  related  field  by 
Horn  (Horn  1974)  in  the  mid  seventies.  The  idea 
has  also  been  exploited  by  Mead  and  his  co-workers 
and  forms  the  central  element  of  the  silicon  retina 
(Mead  and  Mahowald  1988).  This  approach  maps  the 
robot’s  environment  as  a  resistive  grid,  see  figure  15. 
Here  the  vertices  of  the  grid  are  variable  resistors:  ob¬ 
stacles  and  difficult  terrain  are  indicated  by  infinite 
or  high  resistances.  This  provides  a  map  of  the  ter¬ 
rain  in  terms  of  high  and  low  resistances,  the  valleys 
and  peaks  indicating  the  easy  and  difficult  (accessible) 
regions  of  the  environment.  An  optimal  path  can  be 
obtained  through  the  environment  by  simply  applying 
a  potential  difference  between  the  the  robot  position 
(P)  and  its  desired  destination  (G)  and  following  the 
path  of  maximum  current.  Since  the  current  cannot 
flow  through  regions  with  an  infinite  resistance  (obsta¬ 
cles)  and  will  be  reduced  in  regions  of  high  resistance 
(difficult  terrain)  following  such  a  current  path  will 
guarantee  an  obstacle  free  path. 

Although  this  method  has  been  tried  before 
(Mitchell  and  Keirsey  1984)  it  is  the  intention  of  this 
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Figure  16:  Path  from  middle  of  maze  (P)  to  top  left 
corner  (G).  The  dotted  line  represents  that  obtained 
by  repeated  calculation  of  the  path  direction.  The 
solid  line  is  the  complete  path  obtained  by  a  single 
calculation  of  the  path  a  point  P 

implementation  to  map  the  resistive  grid  directly  into 
a  VLSI  device  using  an  array  of  MOS  switches.  Here 
the  grid  vertices  can  adopt  one  of  two  states:  an  in¬ 
finite  resistance  if  the  switch  is  open,  and  zero  resis¬ 
tance  with  the  switch  closed.  The  map  of  the  envi- 
Tvinment,  therefore,  consists  of  a  zero  resistance  sur¬ 
face  with  regions  of  infinite  resistance  representing  the 
obstacles.  Allowing  the  resistance  grid  to  have  a  1:1 
mapping  with  the  localisation  certtunty  grid  enables 
the  current  position  of  the  robot  in  the  resistive  map 
to  be  easily  updated  as  the  vehicle  moves  through  the 
environment. 

The  optimal  path  through  the  environment  is  found 
by  applying  a  potential  difference  between  the  robot’s 
position  and  the  desired  destination  and  then  deter¬ 
mining  the  path  of  maximum  current.  This  is  in¬ 
dicated  by  the  node  on  the  hexagonal  grid  that  has 
the  largest  potential  difference  between  itself  and  the 
robot’s  node. 

Having  moved  to  the  new  node  the  process  can  then 
be  recalculated  and  an  updated  path  found  in  real 
time.,  It  has  been  shown  that  by  continualy  recalcu¬ 
lating  the  direction  of  motion  after  each  step  a  better 
(i.e.  shorter)  path  can  be  obtained  than  by  simply 
calculating  the  complete  path  across  the  grid  in  one 
go  (Tarassenko  and  Blake  1991).  This  is  Illustrated  in 
figure  16.  Here  the  paths  out  of  the  meuie  from  point 
P  to  point  G,  have  been  calculated  using: 

•  repeated  computation  of  the  path  direction,  dot¬ 
ted  line; 


•  single  computation  of  the  path  from  point  P,  solid 
line. 

Since  this  calculation  is  carried  out  on  chip  (this  is  es¬ 
sentially  a  hardware  computation  of  KirchhoiTs  equa¬ 
tion)  the  calculation  can  take  place  in  the  time  it 
takes  the  MOS  grid  to  settle  once  the  voltage  is  ap¬ 
plied.  Furthermore,  since  the  grid  map  can  be  altered 
by  simply  reconfiguring  the  MOS  switches  from  data 
down-loaded  from  RAM  this  implementation  provides 
a  real  time  reconfigurable  map  that  can  be  updated 
as  soon  as  new  obstacles  are  detected  or  the  position 
of  the  robot  is  determined. 

Control  Architecture 

The  control  architecture  for  this  robot  reflects  the 
structure  of  the  path  planning  and  localisation  sys¬ 
tems  and  has  been  designed  in  a  modular  fashion 
(see  figure  12).  Communication  between  the  differ¬ 
ing  modules  takes  place  asynchronously  via  a  conven¬ 
tional  central  controller  which  routes  the  appropriate 
control  signals  to  and  from  the  modules.  The  cen¬ 
tral  controller  is  also  responsible  for  goal  specification 
and  issuing  commands  to  the  robot  platform  controller 
which,  for  the  purposes  of  this  design,  is  again  conven¬ 
tional.  Since  the  intention  of  this  control  architecture 
is  to  allow  the  bulk  of  the  control  processing  to  take 
place  locally  within  the  localisation  and  path  plan¬ 
ning  modules,  the  central  controller  is  very  simple  in 
construction. 

Discussion 

At  the  time  of  writing  a  small  mobile  robot  has  been 
constructed  and  the  localisation  system  implemented 
on  dedicated  VLSI  eural  devices.  A  separate  im¬ 
plementation  of  the  path  planning  system  together 
with  the  localisation  system  has  also  been  undertaken. 
Since  the  path  planning  system  has  not  yet  been  im¬ 
plemented  on  an  appropriate  device  the  integrated  lo¬ 
calisation/path  planning  has  been  carried  out  on  a 
SUN  4  which  communicated  with  the  robot  via  a  ra¬ 
dio  link.  To  allow  the  path  planner  to  operate  on  the 
SUN  4  in  near  real  time  dynamic  reconfigurablity  was 
not  used. 

This  system  has  been  demonstrated  on  a  small  bat¬ 
tery  powered  Turtle  that  has  been  modified  to  carry 
the  infra-red  scanner.  Both  the  scanner  and  the  robot 
controller,  which  is  based  upon  a  68000  processor,  are 
powered  by  the  robots  battery.  This  implementation 
allowed  the  robot  to  move  through  a  static  laboratory 
environment  (i.e.  no  moving  obstacles)  with  the  po¬ 
sition  of  the  robot  and  its  direction  of  motion  being 
updated  in  real  time  at  an  approximate  speed  of  0.4 
ms~l.  This  performance  is  limited  by  the  traverse 
speed  of  the  scanner  and  the  band  width  of  the  radio 
link. 
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Although  no  obstacle  avoidance  function  has  yet  been 
integrated  in  the  control  architecture,  this  is  planned 
for  the  (near)  future.  It  is  expected  that  this  f^unction 
will  be  implemented  in  a  similar  fashion  to  the  work 
described  in  the  first  case  study,  using  a  sensor /motor 
association  network.  However,  in  this  case  it  is  pro¬ 
posed  that  a  number  of  fixed  optical  sensors  are  used 
rather  than  ultra-sonic. 

Although  the  networks  described  here  have  not  all 
been  fully  implemented  in  special  hardware,  this  case 
study  has  illustrated  the  advantages  to  be  gained  from 
the  use  of  dedicated  hardware  in  terms  of  speed,  ease 
of  integration,  and  particularly  size.  With  the  future 
advent  of  larger,  faster  and  more  complex  neural  de¬ 
vices  it  could  be  argued  the  the  full  potential  of  these 
systems  has  still  to  be  realised.  It  could  further  be 
argued  that  it  is  a  only  matter  of  time  before  devices 
similar  to  those  described  here  are  produced  and  used 
in  real  production  systems. 

Conclusion 

The  case  studies  presented  in  this  paper  have  tried  to 
outline  potential  areas  where  mobile  robotics  will  ben¬ 
efit  from  the  use  of  neural  networks.  To  do  this  studies 
have  been  chosen  which,  rather  than  describing  work 
already  completed  and  available  in  the  scientific  press, 
portray  some  of  the  typical  research  that  is  currently 
being  undertaken.  Since  much  of  this  research  is  still 
in  progress  some  of  the  results  are  inevitably  not  com¬ 
plete. 

The  first  two  studies  demonstrate  how  adaptable  non¬ 
linear  systems  can  be  used  for  the  processing  required 
for  functions  from  obstacle  avoidance  through  to  their 
possible  use  for  the  “data  fusion”  required  to  localise 
a  vehicle  detected  by  a  distributed  surveillance  sys¬ 
tem.  Both  studies  are  relevant  to  a  number  of  the 
fundamental  functional  requirements  for  any  mobile 
robotic  system.  The  third  case  study  illustrates  in 
part  how  the  use  of  these  systems  can  be  implemented 
in  dedicated  silicon.  As  both  digital  and  analogue 
neural  VLSI  devices  are  developed,  it  is  expected 
that  neural  networks  will  provide  cheaper,  faster,  and 
more  compact  alternatives  to  conventional  hardware. 
(Holler  et  al.  1989,  LeCun  et  al.  1990)  This  is  likely 
to  be  of  direct  relevance  to  military  requirements 
where  systems  necesseirily  need  to  be  adaptable,  and 
space  on  any  vehicle  is  likely  to  be  at  prenuum. 

In  the  case  studies  large  volumes  of  data  were  required 
to  train  the  networks  appropriately.  This  all  too  often 
presents  a  problem  in  that  although  databases  exist 
these  are  usually  too  small  to  provide  sufficient  da...,. 
Two  solutions  have  been  suggested  to  this  problem. 
The  first  is  to  use  real  on-line  data  by  integrating  the 
networks  directly  into  the  system  in  which  it  is  sup¬ 
posed  to  operate.  This  has  the  obvious  advantage  of 
enabling  an  accurate  estimate  of  the  neural  network’s 
performance,  on  real  data,  to  be  obt^ned. 


The  second  alternative  is  to  adopt  the  solution  demon¬ 
strated  in  the  first  case  study;  the  data  is  simulated. 
If  the  simulation  is  designed  with  care  this  can  quite 
often  provide  a  good  alternative.  Furthermore,  strict 
controls  can  also  be  placed  upon  the  data  allowing  the 
performance  to  be  tested  easily.  However,  by  its  very 
nature  a  simulation  can  never  truly  represent  real  data 
with  all  its  anomalies  and  inaccuracies.  An  analysis 
of  how  a  network  trained  on  simulated  data  would  be¬ 
have  once  placed  in  the  real  world  would  therefore  be 
uncertain  and  difficult  to  verify  without,  as  was  un¬ 
dertaken  with  ALVINN  (Pomerleau  1988),  eventually 
testing  the  networks  on  real  data 

A  further  alternative  is  to  use  real  data  but  to  gather 
this  into  a  large  data  base,  such  as  a  library  of  im¬ 
ages  for  instance.  This  alternative  has  the  advantage 
of  providing  a  repeatable  set  of  real  data  which  can, 
when  required  for  experimental  reasons,  be  properly 
controlled.,  However,  the  work  involved  in  gathering 
such  a  database  can  be  very  large  and  particular  con¬ 
sideration  has  to  be  taken  to  ensure  no  bias  is  intro¬ 
duced  into  it  during  the  production  stage.  This  quite 
often  makes  the  production  of  such  a  data  base  sur¬ 
prisingly  expensive  and  therefore  not  desirable. 

Another  major  problem  for  the  future  use  of  neural 
networks  for  the  sensor  processing  and  control  on  a 
mobile  robot  is  the  verification  of  these  systems.  In 
both  civil  and  military  applications,  for  any  safety 
critical  operations,  it  is  necessary  for  the  behaviour 
of  the  systems  used  not  only  to  be  understood  but 
to  be  designed  with  the  appropriate  safe  guards  to 
prevent  undesirable  responses.  An  appropriate  certi¬ 
fication  procedure  would  also  be  required. 

Until  recently,  with  the  exception  of  the  large  body  of 
work  that  exists  for  some  of  the  unsupervised  neural 
networks,  there  has  been  very  little  effort  in  this  area. 
However,  with  the  use  of  more  mathematically  struc¬ 
tured  neural  networks,  such  as  the  radial  basis  func¬ 
tion  networks  (Broomhead  and  Lowe  1988),  verifies^ 
tion  has  started  to  become  a  possibility.,  Furthermore, 
with  the  more  recent  interest  of  the  control  research 
community,  the  problem  of  certifying  such  systems 
has  started  to  be  addressed  (Simper  1991).  Although 
there  are  no  procedures  laid  down  as  yet  to  ensure 
the  verification  of  the  design,  configuration  (training), 
and  testing  of  a  neural  network  it  has  been  suggested 
that  principles  similar  to  those  use  for  the  verifica¬ 
tion  of  a  mathematical  process  be  used.  A  thorough 
understanding  of  the  problem  that  the  system  is  to 
be  designed  to  solve  is  required,  something  which  is 
generally  necessary  when  trying  to  design  a  network 
solution  for  a  problem  in  any  case.  This  can  be  diffi¬ 
cult  since  many  problems  to  which  neural  network  are 
being  applied  are  highly  non-linear  and  therefore  may 
not  be  easily  trwtable.  With  an  appropriate  under¬ 
standing  it  is  suggested  (Simper  1991)  that  sufficient 
safe  guards  could  be  put  in  place  (e.g.  an  expert  sys¬ 
tem  harness)  to  check  against  undesirable  inputs  being 
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presented  to  the  network  or  outputs  from  the  network 
having  an  undesirable  effect. 

It  is  accepted  that  the  robotic  systems  that  have 
been  described  in  the  case  studies  are  somewhat  sim¬ 
ple  compared  to  the  all  terrain  robotic  systems  that 
are  required  for  the  military  environment.  This  re¬ 
flects  the  fact  that  the  use  of  neural  networks  in  the 
area  of  mobile  robotics  is  still  limited.  However,  the 
case  studies  that  have  been  presented  have  demon¬ 
strated  that  neural  networks  offer  potential  solutions 
to  some  of  the  problems  that  are  generic  to  the  whole 
field  of  mobile  robotics,  and,  if  implemented  in  dedi¬ 
cated  VLSI  silicon,  will  hopefully  have  a  direct  bear¬ 
ing  on  the  future  construction  of  such  vehicles  where 
fast,  compact,  and  adaptable,  systems  are  required. 
As  these  devices  become  available  the  true  nature  of 
the  advantages  to  be  obtained  from  the  use  of  neural 
networks  should  become  apparent  over  the  next  few 
years. 
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Appendix  A:  Mobile  Robot  Initiatives 

Commercial  Research  &  Development 

•  ESPRIT  II  Panorama  (including  BAe,  SAGEM 
(France),  Rauma-Repola  (Finland),  Tamrock 
(Finland),  University  of  Helsinki,  Universi- 
dad  Politecnica  de  Madrid,  Easams  (Fritnley), 
Southampton  University,  Central  Energy  Atom- 
ique  (Grenoble  &  Saclay,  France),  SEPA  (FIAT, 
Italy),  EID  (Portugal),  LNETI  (Portugal),  CRIF 
(Belgium)).; 

Target  vehicles  are  4x4  Mercedes  Jeep,  Rauma^ 
Repola  Forwarder  (FMG  933C  Lokomo),  Tam¬ 
rock  Driller.  Five  year  project  ending  March 
1994. 

•  ESPRIT  I  Voila  (including:  GEC,  Plessy  EL- 
SAG,  MS2i,  RMR,  Oxford  Univestity,  Sheffield 
University,  INRIA,  University  of  Genoa):-  pro¬ 
duction  of  a  vision  guided  mobile  robot. 

•  MARDI,  (including  BAe,  UK  MOD,  Royjil  Arma¬ 
ments  Research  and  Development  Establishment 
(RARDE),  Southampton  University,  Bristol  Uni¬ 
versity,  Lucas):-  production  of  an  all  terrain  mil¬ 
itary  robot., 

•  GEC/Oxford  University  ‘Turtle’  project. 

•  Advanced  Robotics  Research  Centre,  Salford:-, 
UK  national  center  for  robotics. 

•  PROMETHEUS.  Companies  involved  include 
Jaguar,  Lucas,  Pilkington,  BMW,  Porsche,  Volk¬ 
swagen,  SAAB  and  Volvo,  with  PSA  (a  French 
consortium  consisting  of  Peugeot,  Citroen  and 
Talbot).,  Acaderruc  involvement  is  with  the  Uni¬ 
versity  of  Southampton  and  University  of  Oxford. 

•  IVHS  -  Intelligent  Vehicle  Highway  System.  This 
is  a  US  Department  of  Transport  project,  which 
commenced  in  1989  and  is  heavily  funded  with 
181  approved  projects  so  far., 

•  Daimler-Benz  AG,  Stuttgart,  Germany  -  auto¬ 
mated  guidance  system. 

•  Mazda,  Japan  -  three  autonomous  vehicle 
testbeds. 

«  Nissan  Motor  Company,  Yokosuka,  Japan. 

Fuzzy  logic  steering  control  of  an  autonomous  ve¬ 
hicle 

•  Volkswagen,  Germany 

Self  Parking  vehicles  research 

•  Toyota,  Japan 

•  SENTRY  -  Denning  Mobile  Robotics  Inc., 
Woburn,  Mass.,  USA. 


•  EUREKA  -  AMR  (Advanced  MobUe  Robot). 

•  EUREKA  -  MITHRA. 

•  Autonomous  Land  Vehicle,  Martin  Marietta 
Corp.,  Denver,  USA. 

DARPA  funded  project,  vehicle  mtended  mainly 
for  military  purposes. 

•  Fujitsu  Lab,  Ltd.,Kawasaki,  Japan. 

Image  processing  for  autonomous  vehicles. 

•  Mech.  Eng.  Lab,  AIST,  MITI,  Ibaraki,  Japan. 
Steering  control  for  an  autonomous  vehicle. 

•  Tokyo  Research  Lab,  IBM  Japan,  Japan.  -  visual 
navigation  of  autonomous  vehicles. 

•  Shinko  Electric  Co.  Ltd,  Hyogo,  Japan.  -  ultra^ 
sonics  guided  autonomous  vehicles. 

•  Naval  Ocean  Systems  Centre,  San  Diego,  USA.  - 
ground  surveillance  robot. 

•  Sandia  National  Labs,  Albuquerque,  N.  Mexico, 
USA.  -  fleet  of  vehicles  for  remote  control  and 
autonomous  operation. 

•  Savannah  River  Lab.,  Aiken,  South  Carolma, 
USA.  -  autonomous  vehicles  for  nuclear  applica¬ 
tions., 

•  Jet  Propulsion  Lab,  Pasadena,  California,  USA. 

-  primarily  work  for  the  Mars  Rover  vehicle. 

•  Tokyo  Institute  of  Technology,  Japan  -  primar¬ 
ily  walking  vehicles  but  with  spin-off  applications 
including  control  technology. 

•  FMC  Corporation,  Central  Engineering  Labora¬ 
tories,  Artificial  Engineering  Centre,  Santa  Clara, 
California,  USA.  -  multi-goal,  real-time  globrd 
path  planning  for  an  autonomous  land  vehicle. 

•  Army  Engineer  Topographic  Labs,  Fort  Belvoir, 
Virginia,  USA.  -  robotic  reconnaissance  vehicle 
with  terrain  analysis. 

Academic  Research 

•  Carnegie-Mellon  University,  Pittsburgh,  USA. 
Chuck  Thorpe,  -  Navlab/Alvan  and  Terregator. 

•  Oxford  University,  Engineering  Department, 
Prof.  Mike  Brady. 

•  MIT,  USA,  R.  Brooks  k  A.  Waxman. 

•  LAAS,  France,  Raja  Chatila  -  HILARE. 

•  Hcriot-Watt  University,  Edinburgh,  Intelligent 
Automation  Lab.,  Chantler,  M.J.  et  al. 

•  Southampton  University,  Department  of  Aero¬ 
nautics  and  Astronautics,  Prof.  Chris  Harris. 
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•  Tech.  Univ.  Munich,  Germany,  Lehrstuhl  fur 
Mikrowcllentech  -  imaging  radar  for  autonomous 
vehicles. 

•  Massachusetts  University,  Amherst,  Dept.,  Com¬ 
puter  and  Information  Science  -  Autonomous  Ve¬ 
hicle  Navigation  Project. 

•  Oakland  University,  USA,  Centre  for  Robotics 
and  Advanced  Automation  -  Autonomous  vehi¬ 
cle  project. 

•  Univ..  der  Bundeswehr  Munchen,  Neubiberg, 
Germany,  Inst,  fur  Messtech,  Prof.  Dickman. 

•  Ohio  State  University,  Columbus,  USA. 

•  University  of  Maryland,  Center  for  Automation 
Research,  Maryland,  USA.  -  computer  vision  sys¬ 
tems  for  Martin  Marietta  autonomous  vehicle. 
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SUMMARY 

Multisensor  data  fusion  (MDF)  is  the  syn¬ 
ergistic  application  of  data  from  several 
sources,  typically  sensors,  toward  a  specific 
task.  In  the  area  of  guidance  and  control  data 
fusion  plays  a  very  important  role.  By  combin¬ 
ing  the  information  from  several  sensors  it  is 
possible  to  improve  tlie  performance  of  guid¬ 
ance  and  control  systems.  Neural  networks  are 
ideally  suited  to  applications  where  only  a  few 
decisions  are  required  from  a  massive  amount 
of  data.  In  this  sense,  neural  networks  should 
play  a  crucial  role  in  future  data  fusion  systems. 
This  paper  will  describe  several  methods  of 


applying  neural  networks  to  data  fusion, 
including:  self-organizing  hierarchical  neural 
systems,  multi-layer  error  correcdon  learning 
networks,  and  single  layer  pattern  completion 
systems.  Application  case  studies  will  be 
examined  to  determine  how  researchers  have 
applied  neural  networks  to  data  fusion.  In  addi¬ 
tion,  a  discussion  of  feature  representation  and 
feature  weighting  will  be  provided. 


1.  INTRODUCTION 

Multisensor  data  fusion  remains  one  of  the 
most  challenging  research  areas  in  the  compu- 


Figure  1 :  Multi-sensor  Data  Fusion  System 
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tational  sciences.  Multisensor  data  fusion  is  the 
process  of  combining  the  data  from  several  dis¬ 
tributed  sensors  (potentially  thousands)  and 
making  a  decision.  The  sensors  can  vary  widely 
in  reliability,  the  type  of  data  being  received 
can  vary,  and  the  resulting  decision  might  be 
required  in  real-time.  The  applicability  of  neu¬ 
ral  networks  to  this  environment  represents  a 
natural  synergism  between  the  inherent  capa¬ 
bilities  of  neural  networks  and  the  computa¬ 
tional  demands  of  the  multisensor  data  fusion 
(MDF)  problem.  This  paper  reviews  current 
MDF  techniques,  describes  three  neural  net¬ 
work  approaches  to  MDF,  and  presents  some 
potential  MDF  applications  in  guidance  and 
control. 

2.  OVERVIEW  OF  DATA  FUSION 

Data  fusion  is  the  synergistic  combination 
of  data  from  several  sources  into  a  coherent 
decision.  When  the  data  is  supplied  solely  from 
sensors  the  result  is  i  multisensor  dat;:  fusion 
system.  Figure  1  il'ustrates  a  general  muitisen- 
sor  data  fusion  (MDF)  system.  In  general,  a 
MDF  system  can  be  viewed  as  a  situation- 
response  system.  Some  phenomenon  occurs  in 
the  environment  that  is  observed  by  a  set  of  N 
sensors.  Each  sensor  collects  information  and 
transmits  it  across  a  channel  where  features  are 
abstracted  from  the  sensor  data.  The  entire  set 
of  features  recorded  during  a  given  interval  of 
time  represents  the  situation.  The  set  of  features 
produced  from  each  sensor  io  subject  to  differ¬ 
ent  levels  of  noise,  different  time-delays  for 
information  propagation,  and  different  relative 
importance  weightings  between  sensors.  The 
situation  data  is  eventually  fed  to  a  data  fusion 
system  where  a  response  must  be  provided 
from  the  data.  It  is  immediately  evident  why 
data  fusion  is  so  difficult. 

2.1.  Advantages  of  Multisensor  Data  Fusion 

The  pnmary  motivation  for  data  fusion  is 
the  realization  that  single  sensor  information  is 
often  not  enough.  The  synergistic  collection  of 


information  from  a  wide  variety  of  sensors  is 
required  to  produce  reliable  responses. 
Although  this  is  the  primary  reason  for  data 
fusion,  there  are  others.  Luo  &  Kay  (1989) 
have  identified  four  advantages  of  MDF  sys¬ 
tems: 

Redundancy:  By  receiving  sensor  informa¬ 
tion  from  several  similar  sensors  it  is  possible 
to  attain  improved  accuracy.  In  systems  that 
utilize  redundant  sensors  the  fusion  is  per¬ 
formed  at  a  low  level. 

Complementary:  By  receiving  sensor  infor¬ 
mation  from  different  sensors,  it  is  possible  to 
create  a  more  robust  representation  of  the  phe¬ 
nomenon  being  sensed.  In  systems  that  utilize 
complementary  sensors  the  fusion  is  performed 
at  a  high  level. 

Timeliness:  By  distributing  the  sensing  task 
to  several  sensors  it  is  possible  to  produce 
faster  decisions.  Single  sensor  systems  often 
need  to  repeatedly  sample  prior  to  emitting  an 
accurate  decision.  Multisensor  systems  take 
advantage  of  the  redundancy  to  achieve  the 
desired  accuracy. 

Cost:  Depending  on  the  system,  it  is  possi¬ 
ble  to  provide  a  multisensor  system  at  less  cost 
than  a  single  sensor  system. 

2.2.  Multisensor  Data  Fusion  Paradigms 

A  MDF  system  requires  several  capabili¬ 
ties.  It  must  be  able  to  incorporate  and  arbitrate 
data  from  a  large  number  of  sources.  It  should 
allow  the  relative  weighting  of  sources  to  be 
done  easily.  And,  it  should  provide  timely 
responses.  Luo  and  Kay  (1989)  have  outlined 
four  primary  paradigms  that  meet  all  of  these 
requirements: 

Hierarchical  Phase-Template  Systems:  A 
general  paradigm  for  robotic  systems  based 
upon  four  temporal  phase  of  sen,;or-to-object 
distance  (far-  away,  near-to,  touching,  and 
manipulation). 

Logical  Sensors:  Abstracting  each  sensor 
from  a  physical  device  to  a  logical  entity  allows 
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collections  of  sensors  to  represented  very  ele¬ 
gantly.  This  approach  is  useful  in  applications 
that  require  a  world  model  to  operate  in  har¬ 
mony  with  the  sensor  system  (eg.  robotics). 

Object  Oriented  Programming:  Each  sensor 
in  the  MDF  system  is  represented  as  a  data 
object.  An  object  contains  both  data  and  func¬ 
tions  and  it  communicates  to  other  objects  via 
messages.  Like  the  Logical  Sensors  approach, 
this  has  a  very  appealing  general  structure  that 
is  amenable  to  several  symbolic-based  data 
fusion  tasks. 

Neural  Networks:  Create  patterns  from  the 
various  sensors  (via  preprocessing)  and  process 
the  multiple  patterns  using  a  neural  network. 
This  technique  will  remain  the  focus  of  the 
remainder  of  this  paper. 

1.3.  Examples  of  Multisensor  Data  Fusion 
Systems 

The  most  incredible  MDF  systems  are 
mammals,  especially  humans.  The  ability  to 
fuse  auditory,  visual,  olfactory,  and  tactile 
information  is  unparalleled.  Recent  work  by 
Singer  and  others  (Barinaga,  1990)  has  exposed 
some  clues  about  how  humans  are  able  to  per¬ 
form  sensor  fusion.  Information  in  disparate 
regions  of  the  brain  has  been  found  to  phase- 
lock  and  operate  synchronously..  This  research 
is  revealing  a  new  approach  to  neural  systems 
where  information  is  stored  in  oscillations  of 
different  frequencies.  Relative  to  mammals 
most  MDF  systems  pale,  but  the  full  capability 
of  a  human  is  not  necessary  to  provide 
improved  performance  for  most  applications. 
Recent  examples  of  highly  capable  MDF  sys¬ 
tems  include  robots,  surveillance  systems,  and 
target  tracking  systems. 

2.3.  Multisensor  Data  Fusion  Surveys 

This  paper  will  review  the  neural  network 
aspect  of  MDF  with  an  emphasis  on  guidance 
and  control.  There  are  several  resources  that 
discuss  other  aspects  of  MDF.  Maren  &  Pereira 
(1989)  have  conducted  an  extensive  survey  of 


multisensor  information  fusion  that  analyzes 
sensor  selection,  levels  of  abstraction,  architec¬ 
tures,  and  methodologies  for  fusion.  Luo  and 
Kay  (1989)  have  also  conducted  an  extensive 
review  of  multisensor  integration  and  fusion 
with  an  emphasis  on  robotics  applications. 
Mitchie  &  Aggarwal  (1986)  have  performed  a 
survey  of  multisensor  integration  with  an 
emphasis  on  image  processing  applications  and 
Garvey  (1987)  has  analyzed  the  Artificial  Intel¬ 
ligence  approaches  to  multisensor  information 
fusion. 


3.  NEURAL  NETWORK  DATA  FUSION 

There  are  three  primary  methods  for  neural 
network  data  fusion:  (1)  pattern  completion,  (2) 
pattern  matching,  and  (3)  hierarchical  systems. 
Each  neural  network  fusion  technique  has  its 
own  merits  and  an  affinity  for  different  applica¬ 
tion  areas.  In  the  following  sections  each  of 
these  techniques  will  be  examined  with  specific 
applications  cited  with  each  technique. 

3.1.  Pattern  Completion  Neural  Fusion 

The  pattern  completion  technique  for  neu¬ 
ral  network  MDF  is  illustrated  in  Figure  2.  All 
of  the  sensor  data  types  are  concatenated 
together  into  a  large  vector  with  the  desired 
response.  As  an  example,  Anderson,  et  al. 
(1990)  used  this  representation  for  the  classifi¬ 
cation  of  radar  emitters.  In  this  instance,  the 
data  types  where  pulse  repetition  interval,  oper¬ 
ating  frequency,  and  so  on,  and  the  correspond¬ 
ing  output  was  the  name  of  the  radar  system. 

Pattern  completion  neural  fusion  fits  within 
a  situation-response  framework  very  well. 
Applications  that  might  use  this  fusion  tech¬ 
nique  might  include  target  recognition,  signal 
classification,  and  control  applications.  Target 
recognition  might  utilize  infrared,  optical, 
radar  and  acoustic  data  to  describe  the  situation 
and  correlates  this  information  with  the  classi¬ 
fication  of  the  target  as  a  response.  Signal  clas¬ 
sification  can  utilize  Fourier  spectra,  duration 


of  signal,  and  total  signal  power  as  the  situation 
and  produce  a  classification  of  the  signal  as  the 
response.  Control  applications  can  collect  sen¬ 
sor  data  from  the  platform  being  controlled  and 
merge  this  with  infrared  information  to  create 
the  situation  and  the  response  would  be  the  next 
action  to  take. 

Pattern  completion  neural  fusion  primarily 
relies  on  autoassociative  feedback  neural  net¬ 
works  (Simpson,  1990a  &  1991).  Neural  net¬ 
works  that  can  be  used  for  pattern  completion 
include  the  Brain-State-in-a-Box  (Anderson,  et 
al.,  1977)  and  the  Hopfield  associative  memory 
(Hopfield,  1982).  Because  of  the  feedback 
nature  of  these  systems,  stability  is  usually 
achieved  at  the  expense  of  nonlinear  saturation 
points  for  each  processing  element’s  response. 
In  other  words,  feedback  neural  systems  tend  to 
require  a  binary  representation  of  the  data  to 
work  most  effectively. 


The  restriction  to  a  binary  representation 
requires  some  clever  preprocessing  that  pro¬ 
vides  the  requisite  information.  Several  tech¬ 
niques  have  been  developed  for  effectively 
representing  information  in  a  binary  vector, 
including  complete  enumeration,  thermometer 
codes,  and  closeness  codes  (Collins,  1990). 
When  using  pattern  completion  neural  fusion  it 
is  vitally  important  to  develop  a  robust  code 
that  can  be  used  to  represent  the  problem,  or  the 
full  potential  of  the  neural  network  will  not  be 
achieved.  The  code  that  is  developed  must 
accurately  represent  both  the  value  of  the  sen¬ 
sor  data  and  the  relative  importance  of  that  sen¬ 
sor  data. 

3.2.  Pattern  Matching  Neural  Fusion 

One  of  the  most  common  forms  of  neural 
data  fusion  is  the  pattern  matching  approach. 
As  shown  in  Figure  3,  the  situation  is  passed  to 
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the  network  as  the  input  and  the  response  is 
produced  from  the  network  as  an  output.  There 
are  several  neural  networks  that  can  be  used  for 
pattern  matching  neural  fusion,  including  the 
Boltzmann  Machine  (Ackley,  Hinton  & 
Sejnowski,  1985),  the  Cauchy  Machine  (Szu, 
1986),  the  Probabilistic  Neural  Network 
(Specht,  1990),  the  Adaline/Madaline  (Widrow 
&  Winter,  1988),  the  Functional  Link  Net  (Pao, 
1989),  and  backpropagation  (Werbos,  1974; 
Parker,  1982;  Rumelhart,  Hinton  &  Williams, 
1986).  With  the  exception  of  the  Adaline/Mad¬ 
aline  and  the  Functional  Link  Net,  each  of  these 
pattern  matching  neural  networks  have  more 
than  two  layers.  Although  it  is  not  necessary  to 
have  a  multi-layer  neural  network  for  pattern 


matching  neural  fusion,  the  interrelationships 
between  the  various  sensor  data  types  tend  to 
be  nonlinear  and  multi-layer  neural  networks 
tend  to  be  the  most  common  form  of  nonlinear 
pattern  matching  networks  (others  include 
higher-order  neural  networks  such  as  the  Func¬ 
tional  Link  Net).  In  addition  to  the  pattern 
matching  neural  networks,  it  is  also  possible  to 
include  the  pattern  classification  networks  such 
as  Adaptive  Resonance  Theory  (Carpenter  & 
Grossberg,  1987a  &  1987b),  Learning  Vector 
Quantization  (Kohonen,  1990),  and  the  Fuzzy 
Adaptive  Min-Max  Unsupervised  Classifier 
(Simpson,  1990b). 

3.2.1.  Automatic  Target  Recognition 

Rewrite  to  eliminate  system  type,  numbers 
and  specific  methods  — 

Ruck,  et  al.  (1990)  have  used  the  multilayer 
neural  network  pattern  matching  MDF 
approach  for  the  discrimination  of  various 
objects  in  images.  The  data  used  in  the  experi¬ 
ments  was  forward  looking  infrared  (FLIR)  and 
absolute  range.  After  the  image  was  segmented 
into  blobs,  features  where  abstracted  from  the 
data  from  each  of  the  two  sensors.  The  FLIR 
data  was  broken  into  a  feature  set  that  included 
number  of  pixels  in  the  blob,  background  stan¬ 
dard  deviation,  and  complexity  (ratio  of  border 
pixels  to  total  pixels).  The  absolute  range  fea¬ 
ture  set  included  height  of  blob,  complexity  of 
blob  (computed  the  same  as  the  FLIR  complex¬ 
ity),  and  pixel  standard  deviation  across  the 
blob. 

The  features  where  then  concatenated 
together  to  form  a  large  input  vector  to  a  back- 
propagation  network.  The  MDF  system  was 
first  tested  using  only  range  data  to  classify  the 
blobs.  This  demonstrated  showed  that  the  back- 
propagation  network  was  able  to  handle  the 
fusion  problem  effectively  and  the  performance 
improved  when  multisensor  data  was  used. 
Backpropagation  is  not  the  only  neural  classi¬ 
fier  that  could  have  been  used.  Other  neural  net¬ 
work  pattern  classifiers  could  have  resulted  in 
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an  equally  acceptable  solution.  In  the  next 
example  of  pattern  matching  neural  fusion  the 
output  is  not  a  classification  of  the  response, 
rather  it  is  a  set  of  values,  hence  a  pattern  clas¬ 
sification  system  would  not  be  applicable  here. 

3.2.2.  Space  Object  Status  Monitoring 

Eggers  &  Khuon  (1990)  have  used  a  back- 
propagation  network  for  the  monitoring  of 
space  object  .  ITie  sensor  data  consisted  of  two 
radars,  one  ..rating  in  the  L-band  and  the 
other  operating  in  the  X-band.  Each  set  of  sen¬ 
sor  data  was  preprocessed  using  a  fourth-order 
autoregressive  model  that  produced  a  four¬ 
dimensional  feature  vector.  These  two  vectors 
were  concatenated  together  to  form  the  input  to 
the  backpropagation  network.  The  output  from 
the  network  was  a  four  dimensional  vector 
describing  the  ciirrent  state  of  the  object  (sta¬ 
ble,  pitch,  roll,  and  yaw).  The  performance  of 
the  system  showed  reliable  output  responses. 

3.3.  Hierarchical  Neural  Fusion 

Hierarchical  neural  networks  are  used  in 
fusion  systems  that  require  low  level  sensor 


information  to  be  abstracted  into  higher  level 
features  prior  to  the  fusion.  In  the  previous  two 
instances  it  was  assumed  that  the  feature 
extraction  process  was  sufficient  enough  to  cre¬ 
ate  a  representation  that  could  be  used  by  a  neu¬ 
ral  network.  Sometimes  it  is  not  possible  to 
extract  enough  information  from  non-neural 
network  techniques,  especially  in  image  pro¬ 
cessing  applications  where  scale,  rotation,  and 
translation  invariance  are  key  elements  that 
need  to  be  addressed  prior  to  fusion. 

Figure  4  shows  a  typical  hierarchical  net¬ 
work.  This  network  has  five  levels.  There  are 
several  input  planes  (level  1)  that  receive  data 
from  the  sensors.  In  successive  levels  the  fea¬ 
tures  are  gradually  extracted  from  the  data  and 
fused  together.  At  level  4  there  is  a  final  fusion 
of  the  information  into  a  representation  that  is 
classified  by  level  5.  Each  level  of  the  hierar¬ 
chical  network  is  a  two-layer  neural  network 
classifier.  The  connections  within  each  level 
(slab)  are  modifiable  and  are  used  to  classify 
the  features  into  a  more  abstracted  representa¬ 
tion.  The  connections  between  levels  are  hard- 
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wired  to  extract  certain  types  of  feature 
composites.  Typical  adaptation  algorithms  for 
these  modifiable  connections  include  Hebbian 
learning  (Fukushima,  1988),  competitive  learn¬ 
ing  (Hecht-Nielsen,  1990),  and  adaptive  reso¬ 
nance  (Rajapakre,  Jakubowicz,  &  Acharya, 
1990). 

The  first  system  to  employ  this  form  of 
hierarchical  composition  was  the  Neocognitron 
(Fukushima,  1988)  which  was  applied  to  hand¬ 
written  character  recognition.  Other  applica¬ 
tions  of  the  neocognitron  include  situation 
analysis  (Jakubowicz,  1990)  and  automatic  tar¬ 
get  recognition  (Gilmore  &  Czuchry,  1990). 

3.3.1.  Target  Recognition 

An  ART-1  based  hierarchical  system  has 
been  applied  to  the  recognition  of  target-like 
images  (Rajapakse  &  Acharya,  1990).  The 
input  sensors  were  simulated  to  represent  two 
different  types  of  data.  The  features  present  at 
each  sensor  are  demonstrated  to  be  insufficient 
to  classify  the  image  when  used  alone,  but  the 
combination  of  sensors  was  successful  at  the 
same  task.  The  system  is  currently  being 
extended  to  work  with  biomedical  images. 

4.  APPLICATIONS  OF  NEURAL  DATA 
FUSION  TO  GUIDANCE  AND  CONTROL 

There  are  several  areas  where  neural  fusion 
can  be  applied  to  guidance  and  control.  The  fol¬ 
lowing  three  sections  outline  some  candidate 
application  areas  and  provide  some  guidelines 
for  applying  the  neural  fusion  techniques 
described  above. 

4.1.  Guidance  Systems 

Guidance  systems  require  a  few  decisions 
to  made  from  a  massive  amount  of  data.  Neural 
networks  are  ideally  suited  for  these  types  of 
applications.  Because  of  the  parallel  nature  of 
neural  netwoiks,  additional  sensors  will  not 
necessarily  slow  the  system.  In  addition,  neural 
networks  are  able  to  automatically  weight  the 
relative  importance  of  each  type  of  sensor  auto¬ 


matically. 

4.2.  Control  Systems:  Inverted  Pendulum 

Sometimes  there  are  several  different  types 
of  sensor  data  available,  but  the  use  of  the  data 
is  not  clear.  The  inverted  pendulum  (broom  bal¬ 
ancing)  is  an  example  of  such  as  system.  Sen¬ 
sors  placed  on  the  cart  and  on  the  joint  of  the 
inverted  pendulum  can  be  used  to  produce  data 
that  is  used  to  determine  which  direction  to 
move  the  cart  so  the  pendulum  will  remain 
upright.  It  is  possible  to  add  an  image  sensor 
that  can  also  determine  the  position  of  the  pen¬ 
dulum  relative  to  the  cart.  Fusmg  the  informa¬ 
tion  is  not  straightforward  using  conventional 
techniques,  but  a  pattern  matching  neural 
fusion  approach  using  a  supervised  learning 
neural  network  like  the  backpropagation  net¬ 
work  presents  a  feasible  approach. 

Other  platforms  that  might  utilize  data 
fusion  for  control  include  robotics,  automobiles 
and  aircraft.  Robots  can  utilize  MDF  for  navi¬ 
gation  purposes.  Information  from  high-fre¬ 
quency  active  sonar  and  fi’om  cameras  can  be 
fused  to  control  a  nnobile  robot.  Cars  with  look¬ 
ahead  cameras  can  provide  data  that  can  be 
fused  with  sensors  on  the  suspension  system  to 
produce  commands  back  to  the  suspension  sys¬ 
tem  that  will  adjust  the  tension  to  fit  the  needs 
of  the  road.  And,  aircraft  can  fuse  engine  sensor 
data  to  control  air  intake  and  fuel  flow  to  opti¬ 
mize  for  fuel  efficiency,  speed,  or  stealth  pur¬ 
poses. 

4.3.  Surveillance  Systems:  Border  Surveil¬ 
lance 

Although  it  is  not  a  strict  guidance  or  con¬ 
trol  application,  the  use  of  neural  fusion  for  sur¬ 
veillance  is  extremely  promising  and  worthy  of 
mention.  One  of  the  most  difficult  elements  in 
a  surveillance  system  is  the  fusion  of  data  from 
the  massive  number  of  available  sensors. 

As  an  example,  border  surveillance  sensors 
might  include  acoustic,  seismic,  radar  and 
intelligence.  Effectively  fusing  this  data  to  clas- 
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sify  the  activity  is  difficult  using  conventional 
techniques.  But,  it  is  possible  to  train  a  neural 
network  to  perform  this  task  by  supplying  the 
network  with  examples  of  the  various  sensor 
readings  and  the  associated  activity  using  a  pat¬ 
tern  matching  neural  fusion  approach.  Correla¬ 
tions  that  might  not  have  been  intuitively 
obvious  are  often  discovered  by  pattern  match¬ 
ing  neural  networks,  an  extremely  useful 
attribute  in  this  application. 

Other  areas  where  data  fusion  can  be  used 
include  home  security  systems  that  fuse  motion 
and  infrared  data  to  determine  if  an  intruder  is 
in  the  area.  The  ouqrut  of  the  system  can  be 
used  to  control  lights  and  sirens  in  the  localized 
area  of  intrusion  while  automatically  notifying 
law  enforcement. 

5.  CONCLUSIONS 

Neural  fusion  techniques  are  becoming 
more  prominent  because  of  their  ability  to  eas¬ 
ily  handle  massive  amounts  of  data  from  a  wide 
variety  of  sources.  The  use  of  data  fusion  pro¬ 
vides  a  mechanism  for  improving  the  reliability 
of  guidance  and  control  systems  at  the  expense 
of  greater  system  complexity  and  more  compu¬ 
tational  requirements.  The  use  of  neural  net¬ 
works  in  a  MDF  environment  represents  a 
natural  fit  of  the  strengths  of  neural  networks 
with  the  weaknesses  in  data  fusion. 
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Summary 

Several  advanced  neural  network  architectures  are 
expected  to  be  of  significant  value  in  guidance  and 
control.,  This  paper  reviews  three  advanced  neural 
network  architectures  (the  graded  learning 
network,  the  recurrent  backpropagation  network, 
and  the  hierarchical  matched  filter  network)  and 
briefly  discusses  how  they  might  be  applied  to 
problems  in  guidance  and  control., 

1  Introduction 

Many  interesting  problems  in  guidance  and  control 
can  be  reduced  to  the  problem  of  implementing  a 
time  dependent  mapping  (i.e.,  a  spaiioiemporal 
mapping)  between  an  n-dimensional  input  vector 
and  an  m-dimensional  output  vector.  Such 
mapping  problems  are  difficult  to  solve  using 
conventional  techniques  such  as  linear  control 
theory,  statistical  pattern  recognition,  or  dynamic 
programming,  due  to  the  inherent  complexity  of 
spatiotemporal  patterns,  particularly  when 
insensitivity  to  various  warping  transforms  is 
demanded.  Recent  advances  in  neural  network 
technology  may  provide  significant  new  capabilities 
for  addressing  many  of  these  problems. 

This  paper  presents  three  neural  network 
architectures  that  solve  spatiotemporal  mapping 
problems:,  the  recurrent  btick propagation  network, 
the  graded  learning  network,  and  the  hierarchical 
matched  filter  network.,  It  is  expected  that  these 
networks  will  become  increasingly  important  in 
solving  complex  spatiotemporal  mapping  problems. 
Thus,  the  focus  of  this  paper  is  to  familiarize  the 
reader  with  the  structure  and  operation  of  each 
network,  and  to  point  out  similarities  and 
differences. 

These  three  networks  were  selected  because  they 
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represent  each  of  the  three  neural  network  learning 
paradigms:  supervised  learning  (recurrent 
backpropagation  network),  reinforcement  learning 
(grading  learning  network),  and  self-organization 
(hierarchical  matched  filtering  network).  This  will 
allow  us  to  compare  the  different  types  of  learning 
and  to  gain  insight  into  the  suitability  of  each 
learning  paradigm  for  various  types  of  problems. 

The  graded  learning  and  recurrent 
backpropagation  networks  are  very  similar  in  their 
approach  to  approximating  spatiotemporal 
mappings.  Their  primary  differences  are  in  the 
training  procedures  that  are  used.  The  common 
architecture  shared  by  these  two  networks  is 
described  in  Section  3.  This  architecture  consists  of 
a  single  functional  layer  of  fully  connected 
processing  units.  Both  of  these  network 
architectures  address  problems  involving  the 
approximation  of  arbitrary  fixed  spatiotemporal 
mappings. 

In  contrast,  the  hierarchical  matched  filtering 
network  is  designed  specifically  for  spatiotemporal 
pattern  classification  problems.  Its  network 
architecture  is  fundamentally  different  than  that  of 
the  other  two  networks.  This  architecture  is 
described  in  Section  6., 

2  Spatiotemporal  Mappings 

Intuitively,  we  can  describe  a  spatiotemporal 
mapping  as  a  mapping  from  a  temporal  sequence  of 
n-dimensional  input  vectors  to  a  temporal  sequence 
of  7n-dimensional  output  vectors.  Such  an  intuitive 
description  can  be  made  mathematically  precise. 

We  define  to  be  the  vector  Sobolev  space 

of  all  L'’  generalized  n-dimensional  real  vector 
functions  of  time  with  1/  generalized  derivatives  up 
to  order  on  a  compact  set  C  C  72  (for  a  gentle 
introduction  to  generalized  functions  see  [14],  for  a 


9-2 


terse  definition  see  (!]).  With  this  definition,  a 
spaiioiemporat  mapping  is  defined  as  a  mapping 


X  :  A  C  — *  B  C 

Examples  of  spatiotemporal  mappings  include  a 
speech  classifier  that  maps  a  time-varying  speech 
power  spectrum  to  a  word  class  number,  and  a 
control  system  that  maps  a  plant  disturbance  to  a 
system  control  function.  More  specifically,  a  speech 
classifier  takes  a  time-varying  speech  power 
spectrum  (a  spatiotemporal  pattern) 

x:CcR—*R’' 

and  maps  it  to  a  class  number  function  which,  at 
each  time  step,  gives  the  class  number  of  the  word 
that  has  most  recently  been  completed  (i.e.,  the 
class  number  function  is  an  integer-valued  function 
of  time). 

In  a  control  system  we  typically  have  a  plant 
with  mathematical  form 

f{x{t),u(t),d{t))  =  x{t), 

where  x{t)  is  the  state  vector  of  the  plant  (typically 
composed  of  sensor  readings)  at  time  t,  x  is  the 
time  rate  of  change  of  the  state,  u(<)  is  the  vector 
of  control  signals  at  time  t,  and  d(<)  is  the  vector 
of  plant  disturbances  (deviations  from  perfect 
closed-system  mathematical  operation)  at  time  t. 
The  goal  of  the  control  system  is  typically  to 
achieve  some  kind  of  particular  plant  state  (such  as 
a  specific  final  sheet  thickness  in  a  steel  rolling 
mill).  Thus,  a  control  system  is  a  mapping  from  an 
outside  disturbance  function  cl  to  a  control  function 
u  that  can  achieve  the  desired  control  goal.  This 
view  of  control  theory  assumes  that  the  plant  has  a 
fixed  dynamical  structure  so  that  the  controller’s 
job  is  to  produce  a  control  vector  that  deals  with 
the  effects  of  outside  disturbances  on  the  plant. 

In  general,  the  primary  issue  in  spatiotemporal 
pattern  recognition  is  to  build  classifiers  that  are 
insensitive  to  certain  spatiotemporal  warping 
transformations  (such  as  pitch  change  and  time 
warping  in  speech  recognition).  The  primary  issue 
in  control  is  to  build  causal  recursive  controllers 
(i.e.,  controllers  that  operate  in  discrete  time  to 
map  the  set  {x(0),  x(l),  u(l),  x(2), 
u(2),  . .  ,x(t  -  1),  u(<  -  1)}  into  u(t))  that  perform 
well  with  respect  to  some  particular  set  of  goals. 
The  graded  learning  network  and  the  recurrent 
backpropagation  network  are  useful  for  such 


control  applications.  The  hierarchical  matched 
filter  network  is  useful  for  pattern  recognition 
problems  where  there  is  a  desire  to  be  insensitive  to 
time  warps  (a  class  of  spatiotemporal  warping 
transformations  that  map  a  spatiotemporal  pattern 
x{t)  into  a  pattern  x(0(f)),  wher^  S  is  a  strictly 
monotonically  increasing  smooth  scalar  function  of 
time). 


3  A  Fully  Connected 
Network  Topology 

A  simple  yet  very  powerful  netwo'’  y  is 

that  of  a  single  fully  connected  lay^-.  sing 

units  (see  [10]  for  a  discussion  of  the  Cu,^,  oilities  of 
this  topology),  such  as  shown  in  Figure  1,  which 
consists  of  a  single  functional  layer  of  N  units.  To 
simplify  the  discussion,  an  additional  layer  of 
fanout  units  is  included.-  This  layer  distributes 
both  the  fed  back  output  signals  of  the  N 
functional  units,  and  the  n  components  xi{t  —  1), 
i2(t  —  1),  ... ,  Xn{t  -  1)  of  the  input  vector  x(<  -  1) 
(the  input  vector  used  during  the  network’s 
operation  at  time  t  is  latched  into  the  fanout  units 
at  time  t  -  1  along  with  the  fed  back  processing 
element  output  signals  from  time  increment  t  —  1). 
Each  of  the  N  processing  elements  of  the  functional 
layer  also  receives  a  bias  input,  which  we  shall  label 
xo(t  —  1)  where  xo{t)  =  1.0  for  all  values  of  t.  The 
number  of  fanout  units  is  equal  to  1  -t-  n  -I-  A. 

The  outputs  of  the  network  at  time  t  are  the 
outputs  yKOiJ/aCOi  •  •  •  1 1/^(0  of  the  first  m 
processing  elements  of  the  functional  layer  of  the 
network.  The  output  signals  of  the  remaining  units 

To  simplify  the  notation,  we  define 


if  0  <  j  <  n 
if  (n  -I- 1)  <  j  <  i 


(1) 


where  ;  =  0, 1, 2, . . . ,  i  and  L  =  N  +  n..  For 
convenience  we  shall  assume  that  time  always 
begins  at  <  =  0. 

On  time  step  t,  processing  element  i  calculates 
its  output  signal  y[{t)  by  means  of  the  formula 


y'iit)  =  S^{^t))  i  =  l,2,...,Ar  (2) 

/ 

Ii{t)  =  z,(t  -  1)  (3) 

;=0 


where  each  of  the  functions  s,(u)  is  bounded  and 
has  a  continuous  derivative.  A  typical  functional 
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Figure  1 Single  layer  of  fully  connected  processing  units. 


form  for  s<(u)  is  the  bipolar  logistic  function  given 
by 


This  function  is  bounded  between  -1  and  -fl  and 
has  a  slope  of  1  at  zero. 

To  solve  a  spatiotemporal  mapping  problem  with 
the  network  topology  shown  in  Figure  1,  the 
connection  weights  must  be  learned  from  a  set  of 
examples  of  the  mapping.  The  next  two  sections 
describe  learning  methods  that  yield  good 
connection  weights  for  this  network  topology. 

4  Recurrent  Backpropagation 
Network 

The  recurrent  backpropagation  network  learns  to 
approximate  a  mapping  between  a  sequence  of  n 
dimensional  input  vectors  and  a  sequence  of  m 
dimensional  output  vectors.  The  mapping  is 
learned  using  a  form  of  supervised  learning  to 
adapt  the  weights.  Supervised  learning  requires 
that  the  some  of  outputs  of  the  network  be  known 


for  some  or  all  of  the  input  vectors  in  the  sequence. 
In  general,  such  information  is  more  difficult  to 
acquire  than  a  measurement  of  performance.  Thus, 
recurrent  backpropagation  is  more  restrictive  than 
the  graded  learning  network  in  terms  of  the  types 
of  problems  that  it  can  address.  However,  when 
supervised  learning  can  be  used  it  will  in  general 
produce  a  network  that  is  superior  to  graded 
learning  both  in  terms  of  required  training  time 
and  approximation  accuracy.  Thus,  when 
supervised  learning  can  be  used  it  should  be. 

4.1  Recurrent  Backpropagation 
Error  Function 

Unlike  the  graded  learning  network,  recurrent 
backpropagation  has  a  fixed  error  function  that  it 
tries  to  minimize  during  training.  This  error 
function  is  a  spatiotemporal  generalization  of  the 
mean  squared  error  function  used  in 
backpropagation.  To  understand  this  error 
function,  we  must  first  define  the  exact  problem 
that  recurrent  backpropagation  attempts  to  solve. 

Let  the  input  to  and  output  from  the  system  at 
time  i  be  x(f  -  1)  and  y'(<)i  respectively.  We  shall 
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assume  that  the  system  starts  operation  at  t  =  1.. 
Initial  v£ilues  for  the  internal  states  of  the  system 
at  time  <  =  0  are  uniquely  defined  by  the  initial 
values  of  the  output  signals  (in  other  words,  by  the 
vector  y'(0))..  The  system  runs  forward  in  time 
until  some  arbitrary  stopping  time  <,top  is  reached. 
During  each  of  these  ‘runs’  of  the  system  the  input 
sequence  {x(0),x(l),x(2), ..  .,x(t,top  —  1)}  is 
provided  to  the  system.  From  the  initial  state  of 
the  system,  y'(0),  and  the  sequence  of  x(<)  inputs, 
the  system  produces  outputs 
{y'(l).y'(2). .  •  •  ,y'(<stop)}.  Clearly  then,  the 
overall  purpose  of  the  system  on  each  run  is  to  map 
the  set 


X  =  {y'(0),  {x(0),  x(l),  x(2), ....  x(<.top  -  1)}} 
into  the  set 

y'  =  {y'(i).y'(2),...,y'(<.top)}.: 

Thus,  we  can  view  the  operation  of  such  a 
spatiotemporal  system  as  performing  a  mapping 
from  a  set  consisting  of  the  initial  system  state  and 
a  set  of  input  values  provided  over  the  run,  to  a  set 
consisting  of  the  output  states  produced  by  the 
system  over  the  run.  The  confusing  thing  about 
this  picture  is  that  in  many  practical  instances 
(such  as  most  control  systems)  the  x{t)  inputs  are 
functionally  dependent  upon  earlier  y'(<)  outputs. 
The  key  observation  is  that  ihts  doesn’t  matter. 

The  only  effect  this  has  is  to  limit  the  range  of 
possibilities  for  the  x(f)  sequences  that  the  system 
will  see..  We  are  only  concerned  with  what  the 
system  does  when  a  particular  sequence  of  x(<) 
inputs  is  presented  (given  a  certain  initial  state  of 
the  system).  We  don’t  care  how  these  inputs  arose. 

The  error  calculation  procedure  for  the  recurrent 
backpropagation  network  is  similar  to  that  used 
with  the  backpropagation  network,  but  with  one 
important  difference;  not  all  correct  output  signals 
are  known.  In  the  case  of  recurrent 
backpropagation,  we  assume  that  with  each 
training  run  example  x  we  are  also  given 
information  concerning  some  of  the  correct  values 
of  outputs  of  the  network  at  various  points  during 
the  run.,  Specifically,  we  assume  that  at  each  time 
t,  I  <t  <  fstop  during  a  training  run  we  are  given  a 
set  U{t)  of  integers  lying  in  the  range  from  1  to  m, 
inclusive,  such  that  the  correct  output  value  yk{t) 
for  urit  k  at  time  t  is  given  for  each  k  €  U{t).  It  is 
perfectly  acceptable  to  have  U{t)  be  the  empty  set 
at  some  times  t  during  the  training  run.  However, 


for  there  to  be  useful  training,  U (<)  must  be 
non-empty  for  at  least  one  time  t  during  training. 

Given  the  sets  U(t),  and  the  correct  yk{t)  values 
for  each  k  €  U{t),  we  define  the  mean  squared  error 
F(w)  of  the  recurrent  backpropagation  network  to 
be 


F(w)  =  E 


kYI  b*(<)-j/t(0] 


K  = 


«=i  keu(t) 
1 


^'(5) 

(6) 


where  w  is  the  weight  vector  of  the  network, 

#(/(<)  =  the  number  of  elements  in  U {t) 

{if:U{t)  =  0  if  t/(f)  is  empty),  and  E[  ]  is  the 
expectation  or  averaging  operator  (the  averaging  is 
done  over  an  unboundedly  large  number  of  input 
examples  chosen  randomly  with  respect  to  p).  Note 
that  the  entire  sum  is  divided  by  K,  the  total 
number  of  error  terms  used.  Thus,  we  are 
measuring  the  average  squared  error  per  output  for 
which  the  correct  output  is  given.  This  quantity  is 
then  averaged  over  the  entire  input  space  by  the 
expectation  operator.  Again,  as  with 
backpropagation,  the  mean  squared  error  depends 
only  on  the  weights.  Naturally,  for  this  dependency 
to  hold,  it  must  be  assumed  that  the  weights  are 
fixed  throughout  the  evaluation  of  the  network’s 
performance. 


4.2  Recurrent  Backpropagation 
Network  Learning  Law 

The  recurrent  backpropagation  network  learning 
law  is  based  on  the  standard  gradient  descent 
method 


^new  ^  .^old  _  aVwF(w).  (7) 

The  gradient  calculation  requires  the  partial 
derivatives  of  F(w)  with  respect  to  the  components 
of  w..  The  complete  derivation  of  these  partial 
derivatives  can  be  found  in  [10].  The  result  is  a  set 
of  recursion  formulas 


rti;(t)  —  Sfc('^fc(0) 


(8) 


[6,i  zj{t  -  1)]  +  ^  [wkt  rkij(t  -  1)]  I  . 

k  p=i  / 


where 
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nait) 

naio) 


dy'kit) 

dwij 

0. 


At  the  end  of  each  run  (after  all  of  the  zi{t  —  1) 
values  are  known),  the  recursion  formulas  in 
Equation  8  can  be  solved.  Naturally,  in  order  to 
adequately  approximate  the  expectation  operator, 
we  must  average  over  a  large  number  of  runs  where 
the  initial  values  and  input  sequence  examples  are 
chosen  randomly  in  accordance  with  a  fixed 
probability  density  function  p.  The  need  to  batch 
the  results  from  a  number  of  runs  before  modifying 
the  weights  makes  this  learning  law  very  slow. 

Two  variations  of  this  learning  law  have  been 
developed.  The  first  of  these  updates  w  after  each 
time  step  and  is  known  as  the  jump-every-time-step 
variation.  The  second  updates  w  at  the  end  of  each 
run..  Both  of  these  variations  can  improve  the 
training  time  of  the  network. 

Another  variant  of  the  recurrent  backpropagation 
learning  law  is  the  teacher-forced  learning  law 
introduced  by  Ronald  Williams  and  David  Zipser 
[18]  (who  also  introduced  Equation  8).  This  variant 
is  like  the  jump-every-time-step  version,  except  for 
two  changes.  First,  all  of  the  correct  output  values 
ykit)  that  we  are  given  for  training  are  used  in  the 
recursion  equation  (Equation  8)  in  place  of  the 
corresponding  y][(<)  values.  Second,  eifter  each 
weight  jump  the  rnj^t)  value  used  to  compute  the 
jump  is  set  to  zero.  Williams  and  Zipser  report 
that,  at  least  for  some  problems,  me  teacher  forced 
learning  law  seems  to  converge  to  a  useful  solution 
faster  than  the  original  learning  law  or  the  two 
other  variants. 

It  is  worth  noting  that  the  above  derivation 
assumes  that  the  inputs  to  the  network  do  not 
depend  upon  the  weight  values..  For  many  practical 
problems,  such  as  in  control,  this  assumption  will 
be  false  because  of  the  fact  that  the  input  is  derived 
from  the  output  (for  example,  by  a  plant  that  takes 
control  signal  outputs  from  the  network,  which 
definitely  depend  on  the  weights,  and  produces 
sensor  inputs  to  the  network  —  which  therefore 
also  depend  upon  the  weights).  Thus,  in  using  the 
recurrent  backpropagation  network,  this  limitation 
must  always  be  kept  in  mind.  However,  this  isn’t 
to  say  that  the  method  is  unusable  in  these  cases. 
Often,  the  dependence  of  the  input  on  the  weights 


5  The  Graded  Learning 
Network 

The  graded  learning  network  (GLN)  is  a  mapping 
neural  network  which  uses  a  form  of  reinforcement 
learning  in  which  a  performance  measure  or  grade 
is  periodically  presented  to  the  network  to  guide 
learning  [4].  It  combines  the  well  known 
optimization  characteristics  of  simulated 
annealing  [13,7]  with  the  speed  advantages  of  a 
gradient  search  method.  The  result  is  a  powerful 
new  method  of  optimization  for  a  broad  class  of 
problems,  including  guidance  and  control. 

Unlike  supervised  learning  networks  such  as 
backpropagation,  GLN  does  not  require  the  desired 
output  to  be  furnished  for  each  training  trial.  Only 
a  measure  of  overall  network  performance  over  a 
series  of  training  trials  is  required.  This  is  very 
significant  for  problems  in  guidance  and  control, 
since  these  problems  are  often  characterized  by  a 
lack  of  knowledge  of  the  desired  output  for  a  given 
training  trial. 

5.1  GLN  Advantages 

While  GLN  is  not  the  only  form  of  reinforcement 
learning  network,  it  does  have  two  distinct 
advantages  over  other  such  networks: 

1.  The  GLN  learning  law  does  not  specify  the 
form  of  the  grading  function. 

2.  The  GLN  learning  law  is  not  coupled  to  the 
network  topology. 

The  first  of  these  eidvantages  implies  that  the 
grade  must  be  furnished  by  an  entity  external  to 
the  network.  This  external  entity  is  typically  some 
type  of  monitoring  module  which  can  assess  the 
overall  performance  of  the  system.  Such  a 
performance  measure  can  be  very  complex  and 
often  involves  significant  time  delays  between  the 
network  response  and  the  measurement.  In  general, 
the  grade  measurement  can  be  based  upon  any 
factors  that  are  consistently  and  repeatably  a 
function  of  the  input-output  behavior  of  the 
network. 

The  second  GLN  advantage  allows  it  to  be 
applied  to  arbitrary  network  topologies..  In 
particular,  GLN  can  be  used  with  topologies  that 
involve  feedback  connections  between  and  within 
layers  of  processing  elements.  Thus,  GLN  can  be 
applied  to  problems  that  have  complex  dynamical 


is  small,  in  which  case  the  method  may  still  work..  response  with  unknown  or  uncertain  time  delays. 
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5.2  Description  of  Graded  Learning 
Operation 

The  learning  law  for  the  graded  learning  network  is 
executed  whenever  a  grade,  G,  is  presented.  Upon 
such  a  presentation,  the  network  adjusts  its  weights 
using  a  training  process  that  is,  roughly  speaking,  a 
biased  form  of  Cauchy  simulated  annealing  [15]. 
The  bias  is  based  on  an  estimate  of  the  gradient  of 
the  grading  function.  Thus,  each  weight 
adjustment  is  a  combination  of  a  grading  function 
gradient  estimate  and  a  Cauchy  random  jump..  A 
temperature  parameter  determines  the  average  size 
of  this  random  jump. 

In  the  following  discussion  of  GLN  learning,  it 
will  be  convenient  to  define  the  network  weight 
vector,  w,  as  the  vector  containing  the  weights  of 
all  the  units  in  the  functional  layer,  including  the 
bias  weights.  The  dimensionality  of  w  is 
q  =  {l-\-n  +  i^){N). 

In  addition  to  w,  GLN  maintains  three  other 
vectors  of  the  same  dimensionality  for  use  during 
training.  The  first  of  these  vectors,  a,  is  an 
estimate  of  the  gradient  of  the  grading  function 
with  respect  to  the  network  weight  vector.  The 
second  vector,  b,  contains  the  network  weight 
vector  that  thus  far  has  yielded  the  best  (lowest) 
grade  value.,  The  final  vector,  c,  contains  the 
random  jump  values. 

When  a  grade,  G,  is  presented  to  the  network,  it 
is  first  checked  to  determine  if  it  is  better  or  worse 
than  the  current  best  grade,  Gb«,t(0-  Subsequent 
processing  depends  on  the  outcome  of  this  check: 
case  1:  G  <  Gb«gt(<) 

cr^z  =  aG  +  {l-a)Gl>Z 

rpnew  _ 

case  2:  G  >  Gbe.t(<) 

=  da”'*' -f  c/.c®'*'., 

where  a,  /?,  y,  6,  e,  6,  and  4>  are  parameters. 

Typical  values  for  these  parameters  are  given  in 
Table  1. 

Following  these  changes,  the  c  and  w  vectors  are 
updated  as  follows: 


Table  1 :  Typical  GLN  Training  Parameters 
I  Parameter  |  Typical  value  | 


a 

0.99 

n 

1.01 

T 

0.85 

6 

0.25 

e 

0.995 

e 

0.85 

<!> 

-0.15 

c"""  =  a”""  +  Tr 


where  r  is  a  g-dimensional  Cauchy  random  variable 
(see  [15]).,  Finally,  the  new  weight  vector  is 
calculated: 

w"**"  =b  +  c"'“'.,  (10) 

After  w  is  updated,  the  network  is  run  once 
again,  with  this  new  weight  vector,  to  generate  a 
new  grade..  The  process  of  weight  updating  can  be 
continued  indefinitely  (e.g.,  if  the  plant  or  its 
environment  are  expected  to  change  significantly 
over  time),  or  it  can  be  turned  off  when  a 
satisfactory  level  of  performance  is  obtained. 

6  Hierarchical  Matched 
Filter  Neural  Network 

The  hierarchical  matched  filter  network  is  designed 
to  perform  spatiotemporal  pattern  classification 
using  a  generedized  multidimensional  matched 
filter.  TVaditionally,  matched  filtering  has  been 
used  in  application  areas  such  as  communications, 
radar,  and  sonar,  for  detecting  a  specific  waveform 
in  a  time  series  signal.  The  generalized 
multidimensional  matched  filter  is  optimized  for 
spatiotemporal  pattern  classification..  Banks  of 
these  matched  filters  can  be  used  as 
high-performance  classifiers  for  spatiotemporal 
patterns..  Unfortunately,  the  direct  implementation 
of  such  matched  filter  banks  for  large  problems 
(such  as  large-vocabulary  continuous  speech 
recognition),  while  attractive,  is  not  practical. 
However,  it  may  be  possible  to  develop  a  method 
for  exploiting  the  inherent  statistical  redundancy  of 
typical  spatiotemporal  pattern  sets  to  allow  more 
efficient  implementations  of  such  matched  filter 
banks..  In  particular,  we  propose  a  hierarchical 
neural  network  approach  to  this  implementation 
problem. 
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6.1  Matched  Filtering 

One  well-known  method  of  pattern  recognition  is 
template  matching  or  nearest  neighbor  classtficatton 
[5,6],  in  which  unknown  patterns  are  simply 
compared  with  known  examples  (using  an 
appropriate  distance  measurement  procedure)  to 
find  the  closest  matching  examples.  Given  a 
suflSciently  rich  set  of  example  patterns,  such 
classifiers  can  be  shown  to  be  near-optimal. 
However,  for  practical  problems,  classifiers  with  a 
sufficiently  large  number  of  example  patterns  are 
often  impractical., 

Given  two  spatiotemporal  patterns,  u(t)  and 
v(t),  we  want  to  create  a  matched  filter  distance 
measurement  that  is  invariant,  or  at  least 
insensitive,  to  the  distortion  of  patterns  by  some 
preselected  class  C  of  spatiotemporal  warping 
transformations..  For  example,  if  we  wished  to  be 
insensitive  to  small  time  warps,  we  might  define  the 
class  C  to  consist  of  transformations  of  the  form 
u(t)  — >  u{6{t))  where  0.5  <  d6/dt  <  2.3.  Of 
course,  C  might  consist  of  much  more  complicated 
transformations. 

One  choice  for  the  distance  measurement 
Hy{u,t),  that  is  invariant  with  respect  to  a  class  C 
of  spatiotemporal  warping  transformations,  and 
which  only  operates  locally  in  time,  is 


Hy{n,t)=  inf  /  n{T-i)  |u(T)-Tv(r)|  dr, 

(11) 

where  ^  is  a  non-negative  smooth  function  with 
fi{T)  >  0  for  r  €  (-ci)O)  (where  a  is  a  non-negative 
constant)  and  //(t)  =  0  otherwise,  and  where  C  is  a 
defined  set  of  spatiotemporal  warping 
transformations.  The  function  /i  is  called  a  time 
windowing  function.  It  serves  the  purpose  of 
focusing  the  attention  of  the  distance  measurement 
on  the  time  interval  (<  -  a,t]..  H  can  be  interpreted 
as  the  distance  between  the  spatiotemporal  pattern 
u  over  the  time  interval  [f  —  a,<],  and  the  best 
matching  warped  portion  (of  duration  [t  -  a,<])  of 
V.  Hy{u,t)  which  is  called  the  generalized 
multidimensional  matched  filter  (or  simply  matched 
filter,  since  we  shall  not  use  the  traditional  version 
in  the  sequel)  for  input  spatiotemporal  pattern  u, 
tuned  to  spatiotemporal  pattern  v,  over 
spatiotemporal  warp  class  C. 


6.2  The  Nearest  Matched  Filter 
Classifier 

One  way  of  building  a  pattern  classifier  for 
spatiotemporal  patterns  is  to  gather  many 
examples  of  patterns  belonging  to  each  of  the  M 
classes  into  which  each  unknown  input  pattern  is  to 
be  placed.  An  unknown  spatiotemporal  pattern 
can  then  be  compared  with  these  examples  at  each 
time  t,  by  means  of  matched  filters  based  upon  the 
example  patterns,  to  determine  (via  a  classification 
decision  policy)  whether  a  pattern  belonging  to  any 
one  of  the  M  classes  has  just  finished  arriving  or 
not.  This  is  the  nearest  matched  filter  classifier. 

To  make  the  notation  concrete,  let  us  define  such 
a  training  set  of  patterns  to  be  the  set 
P  =  {(vi,/9i),  (v2,^2),-,(vjv,/?jv)},  where 

fik  6  {1,2 . M}  is  the  number  of  the  class  to 

which  example  pattern  vj,  belongs.  The  input 
signal,  u,  is  fed  to  all  of  these  matched  filters  in 
parallel  (the  matched  filters  use  weighting  functions 
that  are  balanced  so  that  their  responses  are 
comparable)..  The  output  of  the  classifier  at  time  t 
is  a  class  number  determined  by  putting  the 
outputs  of  all  N  matched  filters  into  a  decision 
policy  function.  For  example,  if  we  wanted  to  use  a 
simple  1-nearest  neighbor  policy,  we  would  emit  at 
each  time  t  the  class  number  /?,•  associated  with  the 
reference  pattern  Vj  having  the  smallest  matched 
filter  output  Hy,{vi,t) — unless  the  value  of  the 
smallest  matched  filter  output  exceeded  a  fixed 
threshold,  in  which  case  we  would  provide  a  class 
number  output  of  0,  meaning  that  the  input  signal 
does  not  currently  match  any  example  pattern  well. 
Clearly,  the  pattern  class  output  typically  will  not 
be  smooth  (it  will  jump  abruptly  from  one  class 
number  to  another  as  the  winning  classifier  of  the 
competition  process  changes) .  The  generalized 
multidimensional  matched  filter  and  the  nearest 
matched  filter  classifier  (along  with  a  neural 
network  implementation  of  the  classifier  for  time 
warps)  were  introduced  in  1982  [12].  For  further 
information  and  discussion  of  these  concepts,  see 
[10]. 

The  nearest  matched  filter  classifier  can  be 
defined  for  a  variety  of  spatiotemporal  warping 
transformations.  However,  common  choices  might 
be  time  warping  or  pitch  change  transformations. 
Time  warping  would  be  useful,  for  example,  for 
speech  recognition,  where  the  changes  in  how 
words  are  pronounced  are  typically  of  a  time-warp 
nature.  Pitch  change  transformations  (such  as 
those  that  occur  when  we  speed  up  or  slow  down  a 
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phonograph  record)  would  be  useful  for  recognizing 
vehicles  by  their  sounds,  since  much  of  the  sound  of 
a  vehicle  is  from  its  engine,  transmission,  and 
wheels,  which  produce  sounds  at  pitches  that  are 
directly  dependent  on  road  speed  and  gear 
selection.  In  every  case,  the  use  of  an  appropriate 
class  of  transformations  will  ensure  that  each 
reference  pattern  can  serve  as  a  model  for  a  wide 
class  of  similar,  but  transformed,  patterns.  This 
effective  pattern  reuse  greatly  reduces  the  number 
of  reference  patterns  that  must  be  used.. 

Finally,  the  theoretical  classification  performance 
of  the  nearest  matched  filter  classifier  has  been 
established  for  the  case  where  C  is  the  set  of  time 
translations  [II].  In  this  case,  assuming  that  the 
training  set  is  sufficiently  comprehensive  (and 
employs  a  1-nearest  neighbor  classification  decision 
policy),  the  classifier  error  rate  will  satisfy  the 
Cover  and  Hart  inequality  [3] 

BT  (12) 

where  R'  is  the  error  rate  of  the  Bayes  classifier. 

The  nearest  matched  filter  classifier  has  one 
problem,  and  two  advantages.  The  problem  is  that 
ws  may  need  an  enormous  training  set;  this 
requirement  may  make  the  direct  implementation 
of  such  a  classifier  impossibly  large  and 
computationally  burdensome  (since  all  N  of  the 
//vt(u,<)  integrals  must  be  computed  in  parallel). 
The  advantages  are  that  the  classifier  is  capable  of 
near-Bayesian  performance  (at  least  for  some 
classes  of  spatiotemporal  warping  transformations), 
and  that  the  individual  matched  filters  are 
insensitive  to  noise.  This  latter  advantage  is 
particularly  important  if  all  of  the  matched  filters 
are  using  the  same  weighting  function  (as  opposed 
to  weighting  functions  that  merely  have  the  same 
time  integral),  since  Equation  12  shows  that  all  of 
the  matched  filters  will  then  react  approximately 
the  same  to  additive  noise.  Thus,  since  the  decision 
process  is  typically  largely  a  relative  comparison  of 
the  matched  filter  outputs,  the  classifier  output  will 
be  somewhat  insensitive  to  additive  noise.  The 
combination  of  guaranteed  high  classification 
accuracy  (given  our  ability  and  willingness  to 
implement  a  sufficient  training  set)  and  additive 
noise  insensitivity  make  the  nearest  matched  filter 
classifier  an  interesting  candidate  for  solving 
spatiotemporal  classification  problems. 

Finally,  because  the  windowing  function  limits 
the  consideration  of  the  incoming  spatiotemporal 


pattern  to  the  time  interval  [<  —  a,<],  the  nearest 
matched  filter  classifier  can  carry  out  only  the  first 
local'in-time  stage  of  spatiotemporal  pattern 
recognition.  For  many  problems,  local-in-time 
classification  is  not  sufficient.  Often,  to  do  a  good 
job  of  classification,  we  must  exploit  context 
information  that  we  can  obtain  only  by  considering 
longer  periods  of  time.  One  way  to  do  this  would 
be  to  devise  a  classification  decision  policy  function 
that  could  exploit  a  priori  syntax  and  context 
information.  Because  such  a  postprocessing 
operation  is  often  essential  if  adequate  performance 
is  to  be  achieved,  the  nearest  matched  filter 
classifier  should  really  be  thought  of  as  just  a  front 
end  for  a  complete  classifier..  We  now  consider  the 
problem  of  implementing  a  nearest  matched  filter 
classifier  in  a  hierarchical  neural  network  structure. 

6.3  Nearest  Matched  Filter 
Classifier  Implementation 

A  neural  network  that  approximately  implements 
the  nearest  matched  filter  classifier  for  the  class  of 
time  warp  spatiotemporal  warping  transformations 
(see  [10])  has  the  disadvantage  that  it  requires  one 
sub-network  for  each  example  pattern  in  the 
training  set.  Thus,  the  size  of  the  network  grows 
linearly  with  the  size  of  the  training  set. 

For  many  problems,  such  as  continuous  speech 
recognition,  the  patterns  in  the  training  set  will  be 
highly  redundant.  In  other  words,  these  patterns 
will  have  many  sub-patterns  (phonemes,  for 
example)  in  common — usually  at  several  different 
time  duration  levels.  Thus,  from  a  statistical 
perspective,  a  direct  implementation  of  such  a 
nearest  matched  filter  classifier  will  be  highly 
inefficient,  since  each  matched  filter  will  contain 
units  that  are  tuned  to  essentially  the  same 
short-term  patterns  that  a  multitude  of  other  units 
are  also  tuned  to.  Consolidating  these  units  would 
decrease  the  size  of  such  an  implementation 
enormously  -  perhaps  making  such  systems 
practical.  This  section  presents  an  outline  of  a 
scheme  for  accomplishing  this  consolidation  by 
means  of  a  new  hierarchical  design. 

Figure  2  shows  a  design  for  a  self-organizing 
spatiotemporal  feature  detector  layer.  This  layer 
learns  short  time  sequences  of  patterns  in  a  way 
that  makes  it  insensitive  to  small  time  warps. 
Perhaps  the  best  way  to  describe  the  function  of 
this  layer  is  to  begin  with  a  description  of  how  it  is 
trained..  Then  its  function  during  normal  operation 


Figure  2:  Schematic  for  a  self-organizing  spatiotemporal  feature  detector  layer. 


will  be  described. 

During  both  training  and  normal  operation  of 
the  hierarchical  neural  network  classifier,  we 
assume  that  the  spatiotemporal  patterns  are 
entered  into  the  hierarchy  at  the  bottom  as 
sequences  of  vector  inputs  in  discrete  time.  The 
sample  rate  is  greater  than  the  Nyquist  rate  for  the 
fastest  varying  component  of  the  pattern.  Further, 
we  assume  that  the  individual  patterns  to  be 
classified  have  durations  that  are  all  approximately 
the  same  (this  condition  is  not  necessary,  but 
relcLxing  it  adds  complications  that  will  be  avoided 
in  this  paper).  The  patterns  are  assumed  to  arrive 
in  a  random  order  described  by  a  fixed  probability 
density.  The  only  spatiotemporal  warping 
transformations  are  assumed  to  be  mild  time 
warps.  Given  these  assumptions,  we  now  consider 
the  training  of  layer  m  of  the  hierarchy.  We  assume 
that  all  of  the  previous  layers  have  already  been 
trained  and  their  weights  have  been  frozen. 

The  first  step  in  training  layer  m  is  to  train  the 
spatial  weight  vectors  w^i,  Wm2,  ■■■,  ^mN-  These 
are  trained  using  Kohonen  learning  with  conscience 
(see  [10]  for  details),  with  each  successive  training 
trial  utilizing  the  next  discrete  time  sample  of 
input  X(m_i)(/)  from  the  previous  layer  as  the 
training  vector..  The  a  learning  rate  constant  starts 
o(f  at  a  value  near  1.0  and  decreases  to  0  in 
accordance  with  a  cooling  schedule.. 

After  this  training  process  converges,  the  Wmi 
vectors  will  be  distributed  in  space  such 

that  each  time  sample  X(,„_i)(<)  of  the  input  to 


layer  m  is  equally  likely  to  be  closest  (measured 
using  Euclidean  distance)  to  each  of  the  Wmi 
weight  vectors.  At  this  point  these  spatial  weight 
vectors  are  frozen  and  the  training  of  the  z^i 
temporal  weight  vectors  begins. 

Before  temporal  weight  training  begins,  the 
processing  elements  are  modified.  Unlike  spatial 
weight  training,  where  the  processing  elements 
simply  respond  at  each  discrete  time  to  the 
distance  from  the  current  input  to  the  unit’s 
spatial  weight  vector,  now  in  temporal  weight 
training,  the  reaction  to  inputs  will  have  a 
temporal  behavior.  Specifically,  each  processing 
element  will  now  be  governed  by  equations  such  as 


^mi(f)  —  <^(”CXnii(t  ~  1)  -|- 

d(7(ti>-|w,,-Xm-i(i)l)). 


0  S  ^  1) 


and 


U(C)  =  I  ^  if  <  >  0 

\  0  ifC<0, 

if^<0: 


and  where  Wmi  is  the  spatial  weight  vector  of  unit 
of  layer  m,  and  c,  d,  ip,  and  ^  are  positive 
constants,  with  c,(p  <  1. 
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These  equations  ensure  that  each  unit  is 
activated  only  if  the  input  vector  is 

■.  ’fliciently  close  to  the  spatial  weight  vector  Wmi 

chat  unit.  The  attack  function  a  is  used  to 
ensure  that  the  “spin  up”  of  each  unit  is  faster 
than  than  the  “spin  down”. 

Given  equations  of  the  above  sort,  each 
processing  element  within  range  of  the  input 
vector  X(m-i)(0  become  activated..  The 
constants  are  chosen  so  that  this  activation  always 
hard  limits  at  1  within  a  few  time  units  after  the 
input  vector  enters  the  ip  sphere  surrounding  its 
weight  vector.  After  the  inpat  vector  leaves  this 
sphere,  the  activity  of  the  processing  element 
slowly  decays.  Note  that  by  setting  the  value  of  ip 
correctly  it  will  be  possible  to  ensure  that  an 
approximately  constant  fraction  of  the  units  is 
always  active — obviating  the  need  for  the 
development  (as  yet  unachieved)  of  a  “local” 
competition  mechanism. 

Given  the  above  unit  behaviors,  a  steady  stream 
of  input  patterns  is  then  entered  into  the  system, 
and  the  temporal  weights  z^ij  (which  are  all 
initially  zero)  are  modified  by  means  of  the 
Kosko/Klopf  learning  law  (see  Section  3.6  of  [10] 
for  details)..  This  establishes  temporal  weights  in 
accordance  with  commonly  encountered  sequences 
of  unit  activation.. 

Following  equilibration,  the  temporal  weights  are 
frozen  (if  desired,  to  improve  later  performance, 
the  weights  can  first  be  “sharpened”  v^a  a 
sigmoidal  transformation  before  freezing).  The 
layer  is  now  ready  to  be  prepared  for  use.  To  do 
this,  yet  another  transfer  function  is  introduced. 

Following  the  freezing  of  the  weights  of  the  unit, 
the  transfer  functions  empbyed  during  operational 
use  of  the  layer  are  inserted  into  the  units.  This 
transfer  function  has  a  form  such  as 

^mt{t)  —  Or(~C  Xmi(t  —  1)  -f 

dU(lp-  |w,,  -  Xm_i(0|  ) 

[0  -1-  Zfnij  Xm]\  ) 

0<  2m.(0  <  1- 

The  behavior  of  this  transfer  function  is  now  briefly 
described.  First,  for  activation  of  unit  i  of  layer  m 
to  occur,  the  input  vector  X(m-i)(<)  from  the 
previous  layer  must  be  passing  through  the  sphere 
c*"  radius  ij.  surrounding  the  unit’s  spatial  weight 
vector  Wmi.  Second,  the  activation  level  reached  (if 


not  hard  limited  at  1.0)  will  depend  on  the  sum  of 
the  quantity  9  and  the  temporal  input  intensity 
^”11]  achieve  full  activation,  the 

temporal  input  intensity  must  be  quite  large  (this 
ensures  that  during  training  unit  i  is  frequently 
active  following  the  layer  m  units  currently 
supplying  it  highly  weighted  input).  The  offset 
0  >  0  is  used  to  ensure  that  units  that  lie  at  the 
start  of  learned  spatiotemporal  sequences  will 
become  at  least  modestly  active,  even  though  they 
do  not  have  any  predecessor  units  helping  to  get 
them  activated..  In  the  end,  this  scheme  (and  other 
variants)  provides  a  spatial  pattern  of  activity  that 
represents  a  history  of  the  trajectory  of  the  input 
X(m-i)(f)  over  the  last  brief  interval  of  time.  The 
b  '^tory  recorded  by  this  network  layer  is,  in  terms 
ol  a  set  of  spatiotemporal  segments,  burned  into 
the  network  during  training.  If  the  input  pattern 
deviates  too  much  from  one  of  these  trajectory 
segments,  the  layer  will  not  respond  much  at  all. 

From  the  above  observation  it  is  clear  that  this 
spatiotemporal  layer  is,  in  fact,  acting  as  a 
generalized  matched  filter  bank  over  a  brief  intervcil 
of  time,  with  each  activity  constellation 
representing  a  pattern  trajectory  segment  learned 
during  training.  The  transfer  function  used 
precludes  constellations  from  becoming  highly 
active  unless  this  is  so  (unless,  of  course,  the  layer 
has  been  overloaded)..  Note  that  if  overloading 
occurs  the  layer  can  simply  be  made  larger  and  the 
Ip  constant  can  be  lowered.  This  allows  the  use  of 
larger  numbers  of  (more  spatially  discriminating) 
units  to  learn  the  spatiotemporal  subtrajectories. 
Note  that  this  layer  will  be  insensitive  to  modest 
time  warps,  due  to  the  gradual  activation  and 
deactivation  behavior  of  the  operational  transfer 
functions. 

The  above  discussion  has  reviewed  the  definition 
of  a  new  matched  filter  for  spatiotemporal  patterns 
and  introduced  a  hierarchical  layered  neural 
network  designed  to  efficiently  implement  a  nk  of 
such  matched  filters  for  the  purpose  of  achieving 
spatiotemporal  pattern  recognition  that  is 
insensitive  to  small  time  warps.  In  order  to  derive 
the  desired  classification  information,  a  mapping 
network  must  be  employed  that  will  transform  the 
spatial  constellations  of  activity  at  the  highest 
layers  into  a  class  number  and  a  confidence  level. 

The  self-organizing  layer  defined  here  has  only 
limited  redundancy  of  spatial  pattern 
representation  (in  contrast  to  the  Spatiotemporal 
Pattern  Recognizer  network  presented  in  Section 
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6.1  of  [10],  which  has  enormous  redundancy).  Each 
subsequent  layer  in  the  hierarchy  has  a  time 
constant  1/c  that  is  twice  as  long  as  the  layer 
below.  This  “temporal  compression”  property 
ensures  that  the  activity  constellations  at  higher 
and  higher  layers  act  as  codes  for  longer  and  longer 
sequences  of  spatiotemporal  pattern.  It  is 
conjectured  that,  if  the  layers  are  not  overloaded 
and  if  the  spatiotemporal  patterns  are  sufficiently 
distinct,  these  constellation  codes  will  be  unique. 
Further,  in  general,  if  the  input  pattern  does  not 
resemble  a  pattern  presented  during  training,  then 
none  of  the  layers  will  respond  significantly. 

The  architecture  presented  here  moves  us  one 
step  closer  towards  efficient  implementation  of 
large  matched  filter  banks  for  spatiotemporal 
pattern  classification. 

7  Applications  to  Guidance 
and  Control 

In  this  section  we  will  review  a  number  of 
applications  of  interest  to  guidanace  and  control 
problems. 

7.1  Recurrent  Backpropagation 

Recurrent  backpropagation  has  demonstrated  the 
ability  to  model  complex  dynamical  systems.  Such 
a  capability  could  be  very  useful  in  guidance  and 
control  applications.  For  example,  consider  a  seeker 
system  that  must  distinguish  between  different 
types  of  objects  such  as  a  fighter  aircraft  and  a 
fiare..  One  approach  to  distinguishing  between 
these  objects  is  by  their  dynamical  behavior. 

Flares  exhibit  very  simple  dynamical  behavior 
(they  fall)  while  fighter  aircraft  have  significantly 
more  complex  dynamical  behavior  (they  turn, 
accelerate,  etc.).  Such  dynamical  behavior  models 
could  be  developed  using  a  recurrent 
backpropagation  network.  The  network  would 
learn  to  predict  the  next  location  of  an  object  given 
its  recent  dynamical  behavior.  The  predicted 
location  could  then  be  compared  with  the  sensed 
location  to  make  a  targeting  decision.  The  network 
could  be  trained  using  actual  examples  of  human 
pilots  flying  either  real  or  simulated  aircraft. 

7.2  Graded  Learning 

The  graded  learning  network  is  most  applicable  to 
control  problems  in  which  the  objective  of  the 


controller  is  difficult  or  impossible  to  measure.  For 
example,  it  may  be  desirable  to  design  a  missile 
control  system  which  maximizes  its  range.  Such  a 
control  system  does  not  have  an  absolute  measure 
of  performance  since  the  maximum  range 
attainable  is  a  function  of  the  mission.  However, 
we  can  determine  how  often  the  missile  reaches  its 
target  and  use  this  value  to  assign  a  success 
measure  to  the  control  system.  The  graded  learning 
network  can  use  this  success  information  integrated 
over  a  number  of  trial  mission  (either  actual  or 
simulated)  to  learn  an  appropriate  control  law. 

7.3  Hierarchical  Matched  Filter 

The  hierarchical  matched  filter  network  is  most 
applicable  to  spatiotemporal  pattern  classification 
problems  in  which  insensitivity  to  time  warp 
transformations  is  desired..  An  example  of  such  a 
problem  is  speech  recognition  in  which  we  desire  a 
system  that  can  classify  speech  independent  of  how 
fast  the  speaker  is  speaking. 

8  Conclusions 

This  paper  has  presented  three  new  neural  network 
architectures  for  addressing  complex 
spatiotemporal  mapping  problems  such  as  those 
encountered  in  guidance  and  control.  The  structure 
and  operation  of  each  network  was  reviewed,  and 
application  suggestions  were  given.  FVom  this 
discussion,  it  is  clear  that  advanced  neural  network 
architectures  hold  great  promise  for  developing 
next  generation  guidance  and  control  systems. 
Additio-  .  research  and  development  aimed  at 
better  characterizing  the  properties  of  these 
networks  and  exploring  their  applications  is 
required  to  realize  this  promise. 
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UTTL;  Optlcut  controller  for  adaptive  phased  array 
antennas  using  neural  network  archi tecture 

AUTH:  A/OECUSATIS,  C. ;  B/OAS.  P.  PAA:  a/(Renssela«r 

Polytachnic  Troy,  NY)  IN;  Optoelectronic 

signal  processing  for  phased-array  antennas  II: 

Proceedings  of  the  Meeting.  Los  Angelies,  CA.  J«n.  16.  17, 
1990  (A91*24926  09>32).  Bel  1  inghain,  WA.  Society  of 
Photo-Optical  Instrumentition  Engineers.  1990.  p.  161-172. 
Research  /supported  by  USAF. 

ABS.  The  control  of  adaptive  phased  array  antennas  using  the 
least  neah  squares  (IMS)  algorithm  Is  shown  to  be 
analogous  to  the  irripleraentatlon  of  a  two-layer  perceptron 
neural  network.  The  adaptive  weights  may  be  calculated 
using  the  back  propagation  algorithm,  which  is  a 
generalized  version  of  LMS.  By  using  a  full  perceptron 
model,  additional  adaptive  weights  are  introduced  at  the 
receiver;  this  Is  expected  to  improve  performance  over 
existing  systems.  Ah  optical  processor  for  the  control  of 
adaptive  antennas  is  proposed,  based  oh  a  two-evel 
perceptron.  It  Is  shown  that  currently  available 
technology  is  capable  of  realizing  this  receiver;  the 
optical  architecture  may  also  be  applied  to  the  demands  of 
future  wideband  interference  suppression  systems 
90/00/00  91A24942 


UTTL:  Simulation  of  heterogeneous  neural  networks  on 
serial  and  parallel  machines 

AUTH:  A/LANCE.  TRENT  £.  PAA.  A/(Cal ifornla,  University.  Los 
Angeles)  Parallel  Computing  (ISSN  0167-8191).  vol..l4. 
Aug.  1990,  p.  267-303.  Research  supported  by  the  W.  M. 

Keck  Foundation  and  ITA  Foundation. 

ABS  The  development  tool.  DESCARTES  Is  described.  This  toot 
provides  researchers  with  the  capability  to  simulate 
heterogeneous  connectionist  networks  In  which  the  nodes 
and  links  may  have  different  processing  characteristics 
and  effective  cycling  rates  or  which  are. made  up  of 
modular.  Interacting. sub-networks,  DESCARTES  also  makes  it 
possible  for  researchers  to  build  hybrid  networks  which 
combine  elements  from  distributed,  local  1st,  And  symbolic 
marker-passing  networks.  Currently.  DESCARTES  is 
implemented  on  serial  machines,  where  it  is  rble  to 
simulate  networks  of  medium  size  by  utilizing  the 
spreading-activation  process  to  prune  unchanging  nodes 
from  the  update  and  spreading  cycles.  Simulation  on  SIMO 
(Single  Instruction  Multiple  Data)  machines  is  discussed, 
focusing  on  the  SIMO  simulation  cycle,  the  cycle's  update 
stage,  the  SIMO  cycle's  spread  6ut-to«links  stage,  and  the 
efficient  backpropagat ion  on  SIMO  machines.  Simulation  on 
hypothetical  MIMO  (Multiple  Instruction  Multiple  Data) 
machines  Is  also  discussed.  90/08/00  91A22124 


UTTL  Neural  networks  and  the  control  of  smart  systems 

AUTH  A/THURSBY.  M.  H. :  B/OROSSMAN.  8.:  C/YOO.  K.  PAA: 
C/(Florida  Institute  of  Technology.  Melbourne)  IN: 

U  S  ‘Uapan  workshop  on  Smart/Intol 1 igent  Materials  and 
Systems.  Honolulu.  Hi.  Mar.  19-23,  1990,  Proceedings 
(A91-21207  07-23).  Lancaster,  PA.  Tochnomic  Publishing 
Co.,  Inc..  1990,  p.  242-251  Research  supported  by  the 
U.S.  Army  and  Florida  High  Technolog-  and  Industry 
Council 

ABS  Artificial  neural  networks  (anNs)  and  their  ability  to 

model  and  control  dynamical  systems  for  smart  structures, 
including  sensors,  actuators,  and  plants,  are  considered. 
Both  linear  and  nonlinear  systems  have  been  successfully 
modeled.  Presently,  two  diverse  regimes,  smart  mechanical 
Systems  and  smart  electromagnetic  systems,  are  being 
developed.  In  order  to  better  understand  neural 
controllers  as  used  In  the  smart  electromagnetic 
structures,  the  study  of  ANNS  is  directed  toward 
understanding  the  ability  of  the  network  to  approximate 
system  responses.  Networks  are  being  trained  to  mimic  the 
desired  output  of  the  system.  The  damped  sinusoid  was 
chosen  as  the  model  and  was  approximated  using  a 
Jordan-like  iterative  network.  Tne  results  to  date 
indicate  that  the  ANNs  can  easily  mimic  these  systems  - 
the  question  is  whether  the  mechanism  that  the  network 
applies  can  be  related  to  the  mechanisms  for  classical 
analysis.  90/00/00  91A21214 


UTTL  Neurocontrol  of  auto-lock-on  target  -  track ing  sight 
control  system 

AUTH  A/CHEOK.  KA  C  :  B/SMITH.  JAMES  C  .  C/FERNANOO.  JOSEPH  P 
PAA  C/(08kland  Ur.1  .crs' ty ,  Rochester,  Ml)  Control  and 
CO^^JUters  ( ISSN  0315-8934) .  vol  17,  no.  ?.  1989.  p. 

32-36 

ABS:  Neural  nets  were  used  to  implement  the  control  of  an 

auto-lock-on  target-tracking  sight/vision  control  system 
The  objective  of  the  resultant  target-tracking 
neurocontrol  system  is  to  capture  and  emulate  human 
cognitive  action  in  the  eye-hand  coordination  for  tracking 
a  target  using  a  sight  system  The  paper  describes  how  a 
tracking  neurocontroller  was  designed  and  implemented 
using  a  microcomputer-based  real-time  animation  simulator 
Successful  tracking  performance  of  the  neurocontrol  sight 
system  was  achieved  tn  the  presence  of  pseudo-random 
target  maneuvers.  89/00/00  9 1A 19981 


UTTL;  Electronic  neural  networks  for  global  optimization 
AUTH  A/THAKOOR.  A  P. ;  B/MOOPENN.  A.  W. ;  C/fiBERHARDT,  S. 

PAA  C/(JPL,  Pasadena,  CA)  CORP  Jet  Propulsion  Lab.. 
California  Inst,  of  Tech.,  Pasadena.  IN:  Intelligent 
control  and  adaptive  systems;  Proceedings  of  the  Meeting. 
Philadelphia.  PA.  Nov.  7.  8.  1989  (A91-19635  06-63). 
Bellingham.  WA,  Society  of  Photo-Optical  Instrumentat ion 
Engineers.  1990.  p.  170-177.  Research  sponsored  by  OARPA 
and  SOIO. 

ABS  An  electronic  neural  network  with  feedback  architecture, 
implemented  in  analog  custom  VLSI  is  described.  Its 


application  to  problems  of  global  optimization  for  dynamic 
assignment  is  discussed.  The  convergence  properties  of  the 
neural  network  hardware  are  compared  with  computer 
simulatton  results.  The  neural  network's  ability  to 
provide  optimal  or  near  optimal  solutions  within  only  a 
few  neuron  time  constants,  a  speed  enhancement  of  several 
orders  of  magnitude  over  conventional  search  methods,  is 
demonstrated.  The  effect  of  noise  on  the  circuit  dynamics 
and  the  convergence  behavior  of  the  neural  network 
hardware  is  also  examined.  90/00/00  91A19642 


UTTL:  Implemertatlon  of  expert  system/AI  technology  for 
reducing  gro<..^H  test  in  present  and  future  launch  systems 

AUTH  A/ENGLE.  JAMES:  6/OWEN.  CHARLES:  C/COLMENAREZ.  LUIS 

PAA;  C/(Rockwel1  Internet lonal  Corp. .  Space  Systems  Oiv., 
Downey.  CA)  AIAA.  Aerospace  Sciences  Meeting.  29th, 

Reno,  NV.  Jan.  7-10.  1991.  11  p. 

ABS'  The  application  of  expert  system  technology  for  prelaunch 
and  tn-fllght  health  monitoring  is  considered,  and  a 
prelaunch  expert  system  for  the  Orbiter  maneuvering  system 
Is  outlined.  Design  requirements  and  technology  concepts 
for  art  If iclal - Intel  1 Igence/expert-system-based  approaches 
that  reduce  ground  operation  costs  for  a  reaction  control 
system  on  future  vehicles  are  presented.  A  number  of  the 
current  A1  enabling  technologies  for  reducing  ground 
processing,  including  expert  bullt-ln-test.  artificial 
neural  networks,  and  intelligent  machine  vision  systems 
are  discussed.  Attention  is  concentrated  on  system 
integration  of  AI  techniques,  engineering-support 
automation,  and  intelligent  operations  paperless  systems. 

RPT#:  AIAA  PAPER  91-0655  91/01/00  9iA19398 


UTTL’  Use  of  Hopfleld  neural  networks  in  optimal  guidance 
AUTH;  A/STECK.  JAMES  E.;  B/BALAKRISHNAN.  S.  N  PAA: 

6/(Missourl-Rolla,  University,  ftolla)  AIAA,  Aerospace 
Sciences  Meeting.  29th.  Reno,  NV.  Jan.  7-10,  1991.  6  p. 
ABS:  A  Hopfleld  neural  network. archi tecture  for  homing  missile 
guidance  is  considered  In  this  study  A  linear  quadratic 
optimal  control  problem  is  converted  to  a  Hopfleld  neural 
network  structure.  Several  target- Intercept  scenarios  are 
provided  to  demonstrate  the  use  of  the  neural  net 
formulation.  Further  research  directions  are  recommended. 
RPT/r:  AIAA  PAPER  91-0587  91/01/00  9tA19372 


UTTL  Optical  neurochip  based  on  a  three- layered 
feed-forward  model 

AUTH.  A/OHTA.  J. :  B/KOJIMA.  K.:  C/NITTA.  Y.;  0/TAl.  S  ; 

E/KYUMA.  K.  PAA.  E/{Mi tsubi Shi  Electric  Corp..  Central 
Research  Laboratory,  Amagasakl.  Japan)  Optics  Letters 
(ISSN  0146-9592).  vol.  15.  Dec.  1.  1990.  p.  1362-1364. 

ABS’  A  GaAs/AlGaAs  optical  neurochip  based  on  a  three-layered 
feed-forward  model  is  reported.  The  optical  neurochip 
consists  of  a  light-emitting  diode  array  with  66  elements, 
a  fixed  Interconnection  matrix,  and  a  photodiode  array 
with  110  elements.  The  interconnection  matrix  is 
determined  by  the  backpropagat Ion  learning  rule  with  three 
quantized  levels.  There  are  35.  29.  and  26  neurons, 
respectively,  in  the  input,  hidden,  and  output  layers.  The 
excitatory  and  Inhibitory  synapses  are  integrated  on  one 
chip.  By  using  the  chip  and  external  electronics,  the 
recognition  of  tO  characters  with  5x7  bits  has  been 
achieved.  90/12/01  9 1A 18667 


UTT^;  Feedback  network  with  space  invariant  coupling 
AUTH  A/HAEUSLER.  GERD:  0/LANGE.  EBERHARD  PAA' 

B/IErlangen-Nuernborg,  Univers 1 taet .  Erlangen,  Federal 
Republic  of  Germany)  Applied  Optics  (ISSN  0003-6935), 
vol.  29.  Nov.  1C.  1990,  p.  4798-4805 
ABS  Processing  Images  by  a  neural  network  means  performing  a 
repeated  sequence  of  operations  on  the  images.  The 
sequence  consists  of  a  general  linear  transformat  Ion  and  a 
nonlinear  mapping  of  pixel  intensities  The  general  (shift 
variant)  linear  transforraat Ion  is  time  consuming  for  large 
images  if  done  with  a  serial  computer.  A  shift  invariant 
linear  transforreat ion  can  be  Implemented  much  easier  by 
fast  Fourier  transform  or  optically,  but  the  shift 
invariant  transform  has  fewer  degrees  of  freedom  because 
the  coupling  matrix  is  Toeplltz.  A  neural  convolution 
network  with  shift  Invariant  coupling  that  nevertheless 
exhibits  autoassoclat ive  restoration  of  distorted  images 
IS  presented  Besides  the  simple  ifflp>er»wntat ion.  the 
network  has  one  more  advantage  associative  recall  does 
not  depend  on  object  position.  90/11/10  91A17348 


UTTL’  Neural  computation  of  arithmetic  functions 

AUTH:  A/SIU.  KAI-YEUNG;  6/BRUCK.  JEHOSHUA  PAA  A/(StanfOrd 
University,  CA);  B/(  IBM  Alriaden  Research  Center.  San 
Jose.  CA)  CORP:  Stanford  Univ..  CA.;  IBM  Research  Lab  . 
San  Jose.  CA.  IEEE.  Proceedings  (ISSN  0018-9219).  vol. 
78,  Oct.  1990.  p.  1669-1675.  Research  supported  by  the 
Joint  Services  Electronics  Program  and  USAF. 

ABS.  An  area  of  application  of  neural  networks  is  considered.  A 
neuron  is  modeled  as  a  linear  threshold  gate,  and  the 
network  architecture  considered  is  the  layered  feedforward 
network.  It  Is  shown  how  common  arithmetic  functions  such 
as  nui t ipHcatlon  and  sorting  can  be  efficiently  computed 
tn  a  Shallow  neural  network.  Some  known  results  are 
Improved  by  showing  that  the  product  of  two  n-btt  numbers 
and  sorting  of  n  n-blt  numbers  can  oe  computed  by  a 
polynomial -size  neural  network  using  only  four  and  five 
unit  delays,  respectively.  Moreover,  the  weights  of  each 
threshold  element  in  the  neural  networks  require  0(log 
n)-btt  (Instead  of  n-blt)  accuracy.  These  resultn  can  be 
extended  to  more  complicated  functions  such  as  multiple 
products,  division,  rational  functions,  and  approximat ion 
of  analytic  functions.  90/10/00  9lAf4867 


I 
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UTTL:  Holographic  ireptomantation  of  a  fully  connected 
neural  network 

AUTH:  A/HSU,  KEN-YUH;  B/ll,  HSIN-YU;  C/PSAUTIS.  OEMETRX  PAA 
A/(Hati6nal  Chleo  Tung  University,  Hsinchu,  Republic  of 
China);-  C/(Cal<iforhiB  Institute  of  Technology.  Pasadena) 
IEEE.  Proceedings  (ISSN  0018-9219).  vol.  78.  Oct.  1990.  p. 
1637-1645'.  Research  supported  by  OARPA  ana  USAF. 

A6S:  A  holographic  ii^lementation  of  a  fully  connected  neural 
network'  is  presented;  This  model  has  a  simple  structure 
and  ^  relatively  easy  to  implement*  and  its  operating 
principles  and  characteristics  can  be  extended  to  other 
types  of  networks;  since  any .architecture  can  be 
considered  as  a' fully  connected  network  .with  some  of  its 
connections  missing.  The  basic  principles  of  the  fully 
connected  network  are  reviewed.  The  optical  implementation 
of  the  network  is  presented.  Experimental  results  which 
demonstrate  its  ability  to  recogniie  stored. images  are 
given,  and  its  performance  and  analysis  are  discussed 
based  on  a  proposed  model  for  the  system.  Special 
attention  is  focused  on  the  dynamics. of  the  feedback  loop 
and  the' tradeoff  between  distortion  tolerance  and 
image-recognition  capability  of  the  associative  memory. 
90/10/00  9iA1468S 


UTTL:  Maximum  a  posteriori  decision  and  evaluation  of 
class  probabilities  by  Boltzmann  perceptron  classifiers 

AUTH:  A/YAIR,  EYAL;  B/QERSHO,  ALLEN  PAA:  A/(IBM  Scientific 

Center.  Haifa,  Israel):  6/(Ca1 ifornla.  University,  Santa 
Barbara)  IEEE,  Proceedings  (ISSN  0018-9219).  vol.  78. 
Oct.  1990,  p.  1620-1626.  Research  supported  by  the 
Weizmahh  Foundation  for  Scientific  Research.  University  of 
California.  Bell  Communications  Research.  Inc.,  et  a1. 

ABS-  Neural -network  architectures  which  may  offer  a  valuable 
alternative  to  the  Bayesian  classifier  are  described.  In 
networks,  the  a  posteriori  probabilities  are  computed  with 
no  a  priori  assumptions  about  the  probability  distribution 
functions  that  generate  the  data;  the  neural  classifier 
uses  a  general  type  of  input-output  mapping  which  is 
designed  to  optimally  comply  with  a  given  training  set.  It 
is  shown  that  the  a  posteriori  class  probabil ities  can  be 
efficiently  computed  by  a  deterministic  feedforward 
network  which  is  called  the'Bol tzmann  perceptron 
Classifier  (BPC).  Maximum  a  posteriori  classifiers  are 
also  constructed  as  a  special  case  of  the  BPC.  Structural 
relationships  between  the  BPC  and  a  conven.ional 
multilayer  perceptron  are  given,  and  it  is  demonstrated 
that  rather  intricate  boundaries  between  classes  can  be 
formed  even  with  a  relatively  modest  number  of  networ** 
units.  Simulation  results  show  that  the  BPC  is  comparable 
in  performance  to  a  Bayesian  classifier.  90/10/00 
91Ai4883 


UTTL:  Nearest  neighbor  pattern  classification  perceptrc.'is 

AUTH*  a/MuRPHy.  OWEN  J  PAA:  A/(Vermont,  University. 

Burlington)  IEEE.  Proceedings  (ISSN  0018-9219).  vol.  78. 
Oct.  1990.  P.  1595-1598. 

ABS-  A  three-layer  perceptron  that  uses  the  nearest-neighbor 
pattern-classif icatlon  rule  is  presented.  This  neural 
network  is  of  interest  because  It  is  designed  specifically 
for  the  set  of  training  patterns,  and  incorporating  of  the 
training  of  the  network  into  the  design  eliminates  the 
need  for  the  use  of  training  algorithms.  The  technique 
therefore  provides  an  alternative  to  the  limitations  and 
unpredictability  (such  as  having  too  many,  too  few.  or 
inappropriate  training  patterns)  of  the  known  training 
techniques.  Since  the  nearest-neighbor  classification  rule 
is  used,  the  network  is  capable  of  forming  arbitrarily 
complex  decision  regions.  The  design  and  training  of  the 
network  can  be  completed  in  polynomial  time,  whereas  it 
has  been  shown  that  training  a  neural  network  is  an 
NP-complete  problem.  90/10/00  9lAi4880 


UTTL:  Backpropagat Ion  through  time  -  What  it  does  and  how 
to  do  It 

AUTH'  A/WER60S.  PAUL  J.  PAA  A/(N$F,  Washington.  DC)  IEEE. 
Proceedings  ( ISSN  0018-92 19) ,  vol.  78,  Oct.  1990.  p. 
1550-1560. 

ABS  Backpropagat ion,  which  is  a  simple  method  now  being  widely 
used  In  areas  like  pattern  recognition  and  fault 
diagnosis,  is  reviewed  The  basic  equations  for 
backpropagat Ion  through  time,  and  applications  to  areas 
like  pattern  recoqnition  Involving  dynamm  fyttc-s, 
systems  identlf  icatlon.  and  cohftrol ,  are  discussed. 

Further  extensions  of  this  method,  to  deal  with  systems 
other  than  neural  networks,  systems  involving  simultaneous 
equations,  or  true  recurrent  networks,  and  other  practical 
issues  arising  with  the  method  are  described.  Pseudocode 
is  provided  to  clarify  the  algorithms.  The  chain  rule  for 
ordered  derivatives  (the  theorem  which  underlies 
backpropagat ion)  (s  briefly  discussed.  The  focus  is  on 
designing  a  simpler  version  of  backpropagation  which  can 
oe  translated  into  computer  code  and  applied  directly  by 
neural -network  users.  90/10/00  9tAi4674 


UTTL:  30  years  of  adaptive  neural  networks  -  Perceptron. 
Madeline,  end  backpropagat ton 

AUTH'  A/WIDROW.  BERNARD:  6/lEHR.  MICHAEL  A.  PAA.  S/(StanfOrd 
UnWersity,  CA)  CORP:  Stanford  Univ. ,  CA.  IEEE. 
Proceedings  (ISSN  0018-9219).  vol.  76,  Sept.  1990.  p. 
1415-1442.  Research  sponsored  by  SOIO  and  Lockhead 
Missiles  and  Space  Co. .  Inc. 

ABS  Fundamental  developments  in  feedforward  ai  ^ifi.;fa}  neural 
networks  from  the  past  thirty  years  are  reviewed.  The 
history,  origination,  operating  characteristics,  and  basic 
theory  of  several  supervise^  neural -network  training 
algor  thms  (including  the  perceptron  rule,  the 
least-mean-square  algorithm,  three  Medaline  rules,  and  the 


backpropagation  technique)  are  described.  The  "concept 
underlying  these  iterative  adaptation  algorithms  is  the 
minimal  disturbance  principle,  which  suggests  that  during 
training  it  is  advisable  to,  inject"  new  information  into  a 
network  in  a  manner  that  disturbs  stored  information  to 
the  smallest  extent  possible.  The  twoiprincipal  kinds  of 
online  rules  that  hava  daveloped  for  altering  the  weights 
of  e  network  are  examined  for  both  single-threshold 
elements  and  multielement  networks.  They  are 
error-correction  rules;  which  alter  the  weights  of  a 
network  to  correct  error  in  the  output  response  to  the 
'present  input  pattern,  and  gradient  rules,  which  alter  the 
welghts-of  a  network  durlng'^eech  pattern  presentation  by 
gradient  descent  with  the  objective  of  reducing 
mean-square  error  (averaged  over  all  training  patterns) 
90/09/00  91A14870  ' 


UTTL.  Expert  systems  and  advanced  automation  for  space 
miss  ions,' operations  < 

AUTH:  A/OURRANI,  SAUJAD  H. ;  B/PERKINS,  DOROTHY  C,  C/CARLTON. 

P.  DOUGLAS  PAA;  A/(NASA.  Office  Of  Space  Operations, 
Washington,  DC):  6/(NASA.  Goddard  Space  Flight  Center, 
Greenbelt.  MO):  C/(Computer  Sciences  Corp. .  Laurel.  MD) 
CORP;  National  Aeronautics  and  Space  Administration. 
Washington.  DC.;  National  Aeronautics  and. Space 
Administration.  Goddard  Space  Flight  Center.  Greenbelt. 

HD::  Computer  Sciences  Corp..  Laurel.  MD.  lAF. 
international  Astronautical  Congress, .4ist,  Dresden, 

Federal  Republic  of  Germany,  Oct.  6-12,  1990.  6  p. 

ABS.  Increased  complexity  of  space-missions .during  the  19808 
led  to  the  introduction  of  expert  systems  and  advanced 
automation  techniques  in  mission  operations.  This  paper 
describes  several  technologies  in  operational  use  or  under 
development  at  tha  National  Aeronautics  and-Space 
Administration's  Goddard  Space  Flight  Center.  Several 
expart  systems  ere  described  that  diagnose  faults,  analyze 
spacecraft  oparations  and  onboard  subsystem  performance 
(in'conjunction  with  neural  networks),  end  perform  data 
quality  and  data  accounting  functions.  The  design  of 
customized  user  interfaces  is  discussed,  with. examples  of 
their  application  to  space  missions.  Displays,  which  allow 
mission  operators  to  see  the  spacecraft  position, 
orientation,  and  configuration  under  a  variety  of 
operating  conditions,  are  described.  Automated  systems  for 
scheduling  are  discussed,  and  a  tastbad  that  allows  tests 
and  demonstrations  of  the  associated  architectures, 
interface  protocols,  and  operations  concepts  is  described. 
Lessons  learned  are  summarized. 

RPTe;  lAF  PAPER  90-405  90/10/00  9iA140i3 

UTTL:  Identification  of  aerospace  acoustic  sources  using 
sparse  distributed  associative  memory 

AUTH*  A/SCOTT.  E.  A  ;  8/FULLER.  C.  R.;  C/O'BRIEN.  W.  F.  PAA; 
C/(Virginia  Polytechnic  Institute  and  State  University, 
Blacksburg)  CORP*  Virginia  Polytechnic  Inst,  and  State 
Unlv. ,  Blacksburg.  AIAA,  Aeroacoustics  Conference.  I3th. 
Tallahassee.  FL.  Oct  22-24.  1990.  12  p. 

ABS:  A  pattern  recognition  system  has  been  developed  to 

classify  five  different  aerospace  acoustic  sources.  In 
this  paper  the  performance  of  two  new  classifiers,  an 
associative  memory  classifier  and  a  neural  network 
classifier,  is  compared  to  the  performance  of  a  previously 
designed  system.  Sources  are  classified  using  features 
calculated  from  the  time  and  frequency  domain.  Each 
classifier  undergoes  a  training  period  where  it  learns  to 
classify  sources  correctly  based  on  a  set  of  known 
sources.  After  training  the  classifier  is  tested  with 
unknown  sources.  Results  show  that  over  96  percent  of 
sources  were  identified  correctly  with  the  new  associative 
memory  classifier.  The  neural  network  classifier 
Identified  over  8t  percent  of  the  sources  correctly. 

RPTa  AIAA  PAPER  90-3992  90/10/00  9iA12505 


UTTL  Modified  backpropagation  algorithm  for  fast  learning 
in  neural  networks 

A/REYNEflI.  L.  M..  B/FILIPPI.  E.  PAA:  B/(Torlno, 
Poiitecnico.  Turin.  Italy)  Electronics  Letters  (ISSN 
0013-5194).  vol.  26.  Sept.  13,  1990.  p.  1564-1566. 

A  fast  learning  rule  for  artificial  neural  systems  which 
is  based  on  modifications  to  a  backpropagation  algorithm 
is  described.  The  rule  minimizes  the  error  function  along 
the  directlo-'  of  the  gradient  and  backprcpayates  the  error 
pattern  according  to  a  constant  orror  energy  approach. 
90/09/13  91A12410 


UTTL'  App)  icatio.i  of  adjoint  operators  to  neural  learning 

AUTH:  A/BARHEN.  U. .  B/TOOMARIAN.  N. ;  C/GULATI.  S.  PAA: 

A/(>iPL:  California  Institute  of  Technology,  Pasadena); 
C/(0PL.  Pasadena.  CA)  CORP;  Uet  Propulsion  Lab.. 
California  Inst  Of  tech..  Pasadena  :  California  Inst,  of 
Tech..  Pasadena.  Appl led  Mathemat ics  Letters  (ISSN 
0893-9659),  vol.  3.  no.  3.  1990.  p.  13-18.  Research 
supported  by  DOD  and  OOE. 

ABS:  A  technique  for  the  efficient  analytie.il  computation  of 
such  parameters  of  the  neural  architecture  as  synaptic 
weights  and  neural  gain  is  presented  as  a  single  solution 
of  a  set  of  adjoint  equations.  The  learning  model 
discussed  concentrates  on  the  adiabatic  approximation 
only.  A  problem  of  interest  is  represented  by  a  system  of 
N  coupled  equations,  and  then  adjoint  operators  are 
introduced.  A  neural  network  is  formalized  as  an  adaptive 
dynamical  system  whose  temporal  evolution  is  govarnad  b*  a 
sat  of  couplad  nonlinaar  diffarantial  equations.  An 
approach  basad  on  the  minimization  of  a  constrained 
neurofflorphfc  energylike  function  is  applied,  and  the 
cofsplete  learning  dynamics  ara  obtainad  as  a  rasult  of  tha 
calculations.  90/00/00  90A50026 
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UTTL:  Integration  of  para) la)-  imaga  processing  with 
sytAboilc  and  haura)  com^tattons-  for  imagery  exploitation 

AUTH:  A/ROMAN.  EVELYN  PAA:  A/(Optica)  Systems  and  Equipment, 
Lexington,  MA)'  ‘1N;-Airb6rne  reconnaissance  XIII i 
proceedfngS:;Of '  the^>(eet  ing;  San  Oiego,  CA,  Aug.  7-9.  1989 
(A90M8601  22*06). 'Bel )  ingham,.  WA.  SocietyjOf 
Plloto*0ptica1  Instrumentation. Engineers,  1989.  p.  72-83. 

A6S:  Work  combining, paral  lei  .  'symbol  ic,  and, <>neurai' 

metfiodologies  at  different  stages  of  processing  for 
imagery  exploitation  are  discussed;  together  with  a 
prototype  system  coffining  real-time  paral^lel  image 
processing  on  an  8*stage  parallel  imagerprocessing  engine 
(PIPE)  coritputer  with  expert  system  software.  A  summary  of 
basic  neural  concepts  is  given;  and.-the  commonality 
between  neural  nets  and  related  mathematics;  artificial 
inteUigence,  and  traditional  image  processing  concepts  is 
shown.  This  provides  numerous  choices  for  the 
implementation  of  constraint  satisfaction, 
transformational  invariance,  inference  and 
representational  mechanisms,  and  software  lifecycle 
engineering  methodologies  in  the  different  computational 
layers.  89/00/00  90A48609 


UTTL*  Neural  net  classifier  for  millimeter  wave  radar 

AUTH:  A/BROWN.  dOE  P.;  B/ARCHER;  SUE;  C/BOWER,  MARK  R.  PAA; 
C/(Marttn  Marietta  Electronic  Systems.  Orlando.  FL)  IN. 
Real-time  signal  processing  XIl;  Proceedings  of  the 
Meeting.  San  Oiego.  CA*.  Aug.  iO.  il.  1989  (A90-48408 
22*32).  Bellingham.  WA.  Society  of  Photo-Optical 
Instrumentat ion  Engineers.  1989.  p.  71-76. 

A8S‘  This  paper  describes  the  development  of  a  neural  not 
classifier  for  use  in  an  automatic  target  recognition 
(ATR)  system  using  millimeter  wave  (MMW)  radar  data.  Two 
distinctive  neural  net  classifiers  were  developed  using 
mapping  models  (backpropagation  and  counterpropagation) 
and  compared  to  a  quadratic  (Bayeslan-l ike)  classifier.  A 
statistical  feature  set  and  a  radar  data  set  was  used  for 
both  training  and  testing  all  three  classifier  systems. 
This  statistical  feature  set  is  often  used  to  test  MMW 
ARTS  prior  to  using  actual  data.  Results  are  presented  and 
indicate  that  the  backpropagation  net  performed  at  hear 
100  percent  accuracy  for  the  statistical  feature  set  and 
slightly  outperformed  the  counterpropagation  model  in  this 
application.  Both  networks  hold  promising  results  using 
real  radar  data.  89/00/00  90A484i3 


UTTL'  Application  of  neural  networks  to  automatic  control 

AUTH  A/GOLOENTHAL.  WtLLtAM;  6/FARRELL.  JAT  PAA  B/(Cnar1es 
Stark  Draper  Laboratory.  Inc.,  Cambridge,  ma)  in.  aiaa 
Guidance.  Navigation  and  Control  Conference.  Portland.  OR. 
Aug  20-22,  1990.  Technical  Papers.  Part  2  (A90-47576 
21-08).  Washington.  DC.  American  Institute  of  Aeronautics 
and  Astronautics,  1990.  p.  1108*1112. 

A6S:  The  design  of  a  robust  control  system  for  vehicles  with 

highly  nonlinear,  time-varying,  or  poorly-modeled  dynamics 
poses  serious  difficulties  for  all  currently  advocated 
design  methodologies  These  difficulties  arise  in  the 
design  of  current  aerospace  and  underwater  vehicles  and 
are  crucial  for  proposed  autonomous  vehicles.  In  the 
present  paper  the  use  of  neural  networks  in  adaptive 
control  loops  is  proposed,  based  on  the  fact  that 
feedforward  neural  networks  with  at  least  one  hidden  layer 
have  been  uhown  to  be  dense  (under  suitable  assumptions) 
on  the  set  of  continuous  functions.  Thus,  by  the  use  of  a 
suitable  adaptive  learning  algorithm,  the  Interconnect  ion 
weights  of  the  network  could  be  selected  so  that  the 
network  approximates  the  desired  nonlinear  control  law  to 
any  specified  accuracy.  An  extension  of  the 
backpropagation  algorithm  is  presented  which  adaptively 
determines  the  interconnection  parameters  necessary  for 
the  neural  network  to  function  as  a  closed- loop  controller 
and  to  force  the  closed-loop  system  to  match  a  desired 
reference  response.  An  example  of  the  application  of  this 
algorithm  to  the  control  of  the  cart  pole  system  is 
included. 

fi?X0  aiaa  paper  90-3438  90/00/00  90A4769t 


UTTL  Advanced  architecture  for  domestic  and  global 
aviation  systems 

auTh  A/K0R6EL.  CLAYTON  C.  PAA;  A/(Martin  Marietta  Information 
Systems  Group.  Bethesda,  MD)  I.W  Radio  TcCMotcat 
Commission  for  Aeronautics.  Annual  Assembly  and  Technical 
Symposium,  Washington.  DC.  Dec.  4-6,  1989.  Proceedings 
(A90-46390  21-04).  Washington,  OC,  Radio  Technical 
Commission  for  Aeronautics.  1989.  p.  197*209. 

A6S  Candidate  elements  for  the  future  aviation  systems  are 
outlined,  and  top-down  as  well  as  bottom-up  system 
architecture  approaches  are  examined,  and  it  is  noted  that 
automation  and  human  factors  will  dominate  the  system 
including  airspace  and  flight  management  subsystems. 
Communications  systems,  .survei llance,  arid  navigation  and 
landing  are  discussed.  Since  the  systems  under 
consideration  include  possible  synergisnis.  redundancies, 
and  buck'up' capabi 1  it ies,  possible  options  and  trade-offs 
are  analyzed, .Key  technologies  for  future  aviation  systems 
such  as  the  GPS/GLONASS  integrated  receiving  set, 
real-time  expert  system/neural  networ-KS,  antenna  avionics., 
interactive  speech  arid  display  processing,  satellite 
co^unicatloh  shipment,  and  microwave  monolithic 
integrated  circuits  are  presented.  89/00/00  90A46398 


UTTL.  Neural  network  systems 

AUTH.  A/GUYON.  ISABELLE  PAA*  A/(AT6T  Bell  Laboratories. 

Holmdel.  NU)  IN:  International  Symposium  on  Numerical 
Methods  in  Engineering.  5th.  Lausanne,  Switzerland.  Sept. 
11-15.  1989.  Proceedings,  volume  i  (A90-44401  20-31). 
Southampton,  England  and  New  York/6er)fn,  Computational 


Machanics  Publ  iMtions/Sprihger-Varlag.  1989,.  p.  203*210. 

ABS:  In  tha  last  few  years,  devices  inspired  by  the 

architecture  of  -the  brain  have. become  much  more  powerful. 
A  lot  of  effpi-t  has  been  concentrated  bn  networks  using 
verV-_r6ugh  models  of'neur-oh  cells. (formal  neurons).  The 
ability  of  such  systamt -to  I’aern  fi^om' exarnplas  is  a 
part icular^ly  attract ivs.^fsatu'  aaaarch  is  still  in  its 
infancy,  but  it'  is  axpactad  i  .  thase^modalt  will  ba 
usaful  botti  as  models  of  raal  brain  ^funct  ion,  and  as 
computational-davicas  foi^  many  applications  including 
optimization;  pattarn  racognitibn.  spaach  analysis,  and 
signal  pi-bcasaihg.  Architecturas  and  tha  assoc iatad 
learning  algorithms  that  hava  been  proposed  are  reviewed. 
The  notion  of  generalization  from^  tha.  training  axamplas 
art  explained,  and  various  exa^las  of  problams  and 
applications  of  practical  interest  that  can  be  handled  by 
neural  networks  are  presented.  89/00/00  90A44412 


UTTL!  Neural  networks  for  automatic  target  recognition 

AUTH:  A/ROTH,  MICHAEL  W.  PAA:  A/(Johnt  Hopkins  University. 
Laurel,  MO)  Johns  Hopkins  APL  Technical.  Digest  (ISSN 
0270-5214),  vol .  il.  Jah-.-Juhe  1990.  p.  117*120.  Ressarch 
supported  by  the  Johns  Hopkins  University. 

ABS:  The  use  of  neural  natworks  and  nauroebmputars  is  discussed 
and.  their  applications  for  autorMtfc  target  recognition 
(ATR)  are  reviewed.  A. framework  is  presented  illustrating 
the  application  of, neural  natwork ‘technology -to  the 
solution  of  the  ATR  problem  of  recognizing  high-value 
targets  in  noisy  envJronmehts  and  discriminating  them  from 
low-value  objects  and  false  alarms.  Neural  network  tools 
which  may  ba  applied  to  ATR  heeds  include  collective 
computation  for  fast  optimal ization.  neural  network 
learning  algorithms;  neural  network  inspired. feature 
selection,  and  a  neural  natwork  for  higher  vision.  An 
example  of  a  binocular  sterao  displaeament  map  produced 
using  model  images  and  preliminary  stereo  calculations  on 
tha  Connactibn  Machine  at  the  Naval  Rasearch  Laboratory  is 
presented  and  discussed.  It  is  pointed  out  that  neural 
learning  could  facilitate  the  development  of .both 
automatic  knowledga  acquisition  and  continuous  system 
refinement,  two  important  ATR  advances.  90/06/00 
90A44324 


UTTL:  New  directions  in  missile  guidance  •  Signal 
processing  based  on  neural  networks  and  fractal  modeling 

AUTH:  A/BOONE.  BRADLEY  Q.;  B/CONSTANTIKES.  KIM  T.;  C/FRY, 

ROBERT  L.:  D/GILBERT,  ALLEN  S.:  E/KULP,  ROBERT  L.  PAA. 
E/(Johns  Hopkins  University.  Laurel,  MO)  Johns  Hopkins 
APL  Technical  Digest  (ISSN  0270*5214),  vol.  11,  Jan. -June 
1990.  p  28*38. 

ABS:  Projects  investigating  the  utility  of  signel  processing 
based  on  neural  networks  and  fractal  scene  modeling  are 
discussed.  New  approaches  to  target  recognition  and 
scene-matching  development  are  examined  with  attention  to 
the  performance  and  characteristics  of  image-based  scene 
matchers.  A  discussion  on  new  models  end  representations 
for  missile  guidance  includes  an  investigation  of  neural 
network  learning  models  emphasizing  the  training  phase  and 
the  various  alternatives  to  target  representation.  An 
investigation  of  the  recognition  of  range-profile  ship 
signatures  using  a  back-propagation  neural  net  with 
comparisons  to  baseline  statistical  classifiers  is 
described.  Prospects  for  future  work  are  discussed 
including  innovative  approaches  to  target  acquisition. 
90/06/00  96A443i8 


UTTL.  Neural  networks  for  control  and  system 
Identification 

AUTh.  A/WERBOS.  PAUL  J.  PAA:  A/(nSF.  Washington,  DC)  IN. 

IEEE  Conference  on  Decision  and  Control,  28th,  Tampa.  FL. 
Dec.  13-15.  1989,  Proceedings.  Volume  1  (A90-40776  18-63). 
New  York.  Institute  of  Electrical  and  Electronics 
Engineers.  1989.  p.  260-265. 

ABS  A  review  is  presented  of  the  field  of  neuroenglneertng  as 
a  whole,-  highlighting  the  importance  of  neurocontrol  and 
neurotdentif ication.  Then  a  description  is  given  of  the 
five  major  architectures  in  use  today  in  neurocontrol  (in 
robotics,  in  particular)  and  a  few  areas  for  future 
research.  Also  included  are  comments  on 
neuroidenttf ication.  89/00/(X>  90A40788 


UTTL:  Obscured  object  recognition  for  an  ATR  application 

AUTH’  A/EICHMANN.  G. :  8/JANKOWSKl.  M. ;  C/BASU,  S.l 

D/STOJANCIC,  M.:  E/ROYTMAN.  L.  PAA:  E/(City  College. 
New  York)  IN:  Advances  in  image  compression  and 
automatic  target  recognition;  Proceedings  of  the  Meeting, 
Orlando.  FL.  Mar.  30.  31,  1989  (A90-39951  17-63). 
Bellingham,  WA.  Society  of  Photo-Optical  Instrumentation 
Engineers.  1989.  p.  66-73. 

ABS:  A  common  and  mainly  unsolved  problem  in  image  processing 
is  occlusion.  Occlusion  occurs  when  one  or  more  objects 
obstruct  the  sensor's  view.  In  this  paper,  three  methods* 
a  neural  network,  a  superresol ving  hon-parametric 
predictor,  and  an  Extended-Post  Context-free  Grammar 
syntactic  pattern  recognizer  are  used  to  generate  the 
missing  data.  To  illustrate  these  methods,  their 
application  to  the  reconstruction  of  obscured  Roman 
characters  are  presented.  89/(X)/0p  90A39958 


UTTL;  the  elements  of  adaptive  neural  expert  systems 
AUTH:  A/HEALY,  MICHAEL  J.  PAA:  A/(Boelng  Computer  Services. 

Seattle,  WA).  IN:  Applications  of  artificial  intelligence 
VII;-  Proceedings  of  the  Meeting.  Orlando  Fl.  Msr.  28-30. 
1989.  Part  2  (A90-36876  17*63).  Bellingham.  WA.  Society  of 
Photo-Optical  Instrumentation  Engineers.  1989.  p.  830-837. 
ABS:  The  general izetion  properties  of  e  dess  of  neural 

architectures  can  be  modeled  mathematically.  The  model  is 
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a  parallel -pr«dtc«t*  cik1cu1u«  basad.on  pattarn  racognttion 
and  aalf-drgahiiatlon'of  long-tarm  maworylh-a  naurai 
natwork.  It  nay  prdvida  tha  basis  for  adaptiva  axpart 
systans  capabla  of  Inductiva  learning  and  rapid  processing 
In  a  highly  conplax  and  changing  environment.  89/00/00 
90A38963 


UTTL:  Naurai  networks  for  salf*laarn1ng  control  systems 

AUTH:  A/NQUYEN.  DERRICK  H:*  B/WIDROW,  BERNARD.  PAA: 

B/(Stanford  University,  .CA)  CORP:  Stanford  Univ.,'  CA. 
IEEE  ControHSystams  Magazine  (ISSN  0273^1708).  voi.  i6, 
April  1990,  p.^ 18'33; ^Research  supportad-by  SDXO,  USAF, 
Thonson*CSF,  and  Lockheed  Mlssl tea 'and'Spaca  Co. ,  Xnc. 

ABS:  It  Is  shown  how  a  neural  network  can  learn  of  its  own 

accord  to  control  a  nonlinear  dynamic  system.  An  emulator, 
a  multilayered  neural  network*,  learns  to  identify  the 
system's  dynamic  characteristics.  The  controller,  another 
multi layered  neural  network,  next  learns  to  control  the 
emulator.  The  self 'trained  controller  1s^ then  used  to 
control  the  actual  dynamic  system.  The 'learning  process 
continues  as  the  emulator. and. controller  improve  and  track 
the  physical  process.  An  example  is  given  to ^illustrate 
these  Ideas.  The  'truck-backer,'U^er. '  a  neural  network 
controller  that  steers'a  trailer* truck  while  the  truck  Is 
backing  up  to  a  loading  dock,  is  demonstrated.  The 
controller  Is  able  to  guide  the  truck  to  the  dock  from 
almost  any  initial  position.  The  technique  explored  should 
be  applicable  to  a  wide  variety  of  nonlinear  control 
problems.  90/04/00  90A37S71 


UTTL:  Survey  of  neural  network  technology  for  automatic 
target  recognition 

AUTH:  A/ROTH,  MICHAEL  V.  PAA:  A/CJohns  Hopkins  University. 

Laurel,  MO)  IEEE  Transactions  on  Neural  Networks  (ISSN 
104S'9237),  voi.  1.  March  1990i  p.  28-43. 

ABS:  A  review  is  presented  of  ATR  (automatic. target 

recognition),  and  some  of  the  highlights  of  neural  network 
technology  developments  that  have. the  potential  for  making 
a  significant  impact  on  ATR  are  discussed.  In  particular, 
neural  network  technology  developments  in  the  areas  of 
collective  computation,  learning  algorithms,  expert 
systems,  and  neurocompoter  hardware  could  provide  crucial 
tools  for  developing  improved  algorithms  and  computational 
hardware  for  ATR.  The  discussion  coversprevlous  ATR 
system  efforts.  ATR  issues  and  needs,  early  vision  and 
collective  coasMjtatlon.  learning  and  adaptation  for  ATR. 
feature  extraction,  higher  vision  and, expert  systems,  and 
neurocomputer  hardware.  90/03/00  90A34467 


UTTL:  Identification  and  control  of  dynamical  systems 
using  neural  networks 

AUTH:  A/NARENDRA.  KUMPATI  6/PARTHASARATHY .  KANNAN  PAA: 

8/(Ya1e  University.  New  Haven.  CT)  IEEE  Transactions  on 
Neural  Networks  (ISSN  1045-9227),  voi  1,  March  1990.  p. 
4-27.  Research  supported  by  Sandia  National  Laboratories. 

ABS*  It  is  demonstrated  that  neural  networks  can  be  used 
effectively  for  the  Identification  and  control  of 
nonlinear  dynamical  systems.  The  emphasis  Is  on  models  for 
both  identif ication  and  control.  Static  and  dynamic 
back -propagat ton  methods  for  the  adjustment  of  parameters 
are  discussed.  In  the  models  that  are  introduced, 
multilayer  and  recurrent  networks  are  interconnected  in 
novel  configurations,  and  hence  there  is  a  real  need  to 
study  them  In  a  unified  fashion.  Simulation  results  reveal 
that  the  identification  and  adaptive  control  schemes 
suggested  are  practically  feasible.  Basic  concepts  and 
definitions  are  Introduced  throughout,  and  theoretical 
questions  that  have  to  be  addressed  are  also  described. 
90/03/00  90A34466 


UTTL:  Comparison  of  model  based  vision,  statistical  based, 
and  neural  net  based  ATRs 

AUTH:  A/THEXS.  TIMOTHY  J. ;  B/AKERMAN,  ALEXANDER.  Ill  PAA' 

B/(I-MATH  Associates.  Inc..  Orlando.  Fl)  IN-  NAECON  89; 
Proceedings  of  the  IEEE  National  Aerospace  and  Electronics 
Conference,  Dayton,  OH.  May  22-28.  1969.  Volume  4 
(A90-30676  12-01).  New  York.  Institute  of  Electrical  and 
Electronics  Engineers.  Xnc..  1989.  p.  1733-1738. 

ABS:  An  effort  is  made  to  establish  a  common  ground  upon  which 
a  comparison  of  model -based  vision  (MBV), 
statistical-based,  and  neural -net-based  (NN)  automatic 
target  recognizer  (ATR)  approaches  can  be  performed.  A 
deflnltfnn  for  each  type  of  ATR  as  compared  to  a 
ATR  is  provided.  Upon  these  definitions,  the  differences, 
purported  risks,  and  benefits  are  described.  It  is  found 
that  the  comparison  between  statistical.  MBV.  and  NN 
approaches  to  ATR  can  only  be  made  at  a  very  high  system 
level.  The  differences  primarily  deal  with  how  the  desired 
target  is  represented  within  the  ATR.  These  representation 
differences  lead  to  other  implementation  differences, 
which  affect  the  performance  flexibility  and  technical 
achievabii tty  of  each  approach  as  It  is  faced  with  the 
realities  of  new  target  ty^s  and  engagement  conditions. 

It  is  noted  that  as  attempts  are  made  to  become  more 
specific,  there  are  always  attempts  to  indicate  that  a 
particular  technique  does  not  belong  exclusively  to  one 
class  of  recognizers  versus  another.  Indeed,  a  hybrid 
approach  of  using  models  to  train  a  statistical -based 
classifier  is  valid,  but  not  clearly  separable  into  one 
class  of  recognizers.  89/00/00  90A30788 


UTTL:  Intell igeht  Mission  Adaptive  Controller  (IMAC) 

AUTH:  A/GEI6ER,  KEVIN:  B/EOSON.  BRUCE:  C/MCCORD.  UIM  PAA; 
C/(USAF.  Avionics  Laboratory.  Vrfght-Patterson  AFB.  OH) 
IN:  NAECON  89:  Proceedif^s  of  the  IEEE  National  Aerospace 
and  Electronics  Conference.  Dayton:  OA,  May  22-28,  f989. 
Volume  3  (A90-30676  12-01).  New  York.,  Institute  of 


Electrical  and  Electronics  Engineers.  Inc.,  1989,  p. 
1186-1192. 

ABS:  The  Intelligent  Mission-Adaptive  Controller  (IMAC) 

research  program  is  investigating  distr ibuted-AI  (DAI)  and 
adaptive  neural  system  (ANS)  technologies  for  application 
in  active  el^tronic-cbunterMasure '(ECM)  resource 
management.  The  threat  environment  for  tactical  and 
strategic  aircraft  requires  the  ECM  system  to  handle 
numerous  fast-reacting,,  sometimes  agile  systems  which  vary 
in  function  from^acquisltlon  to  weapons  guidance:  IMAC  is 
ah  attempt  to -capture  and  demonstrate. the. in^ortant 
concepts ,of  an  ECMresource  manager,  it  deals  with  the 
tradeoffs  between  ECM  effectiveness,  system  costs,  and 
near-^ and  far-term  survivability.  Preliminary  results  show 
that- a 'spreadsheet'  format  for-acquiring 
throat/threat-response  information  Is  superior  to 
decision-tree  and  fuzxy-c6gnitlve-map  formats;  Capturing 
coii^>lex  correlations  is  found  to  be  the  key  problem  for 
which  a  good  knowledge-representation  scheme  is  essential. 
89/00/00  90A30765 


UTTL:  An- appl Ication  of  neural' net  technology  to 
surveillance  information  correlation  and  battle  outcome 
prediction 

AUTH:  A/MALONEY,  P.  SUSIE.  PAA:  A/(L6ckheed  Missiles  and  Space 
Co..  Inc.,  Austin,  TX)  IN: .NAECON  89;  Proceedings  of  the 
IEEE  National  Aerospace  and  Electronics  Conference. 

Dayton;  OH.  May  22-28,  1989.  Volume  2  (A90=30676  12-01). 
New  York,  institute  of  Electrical  and  Electronics 
Engineers,  Inc.,  1989;  p.  948-955.  Research  supported  by 
the  Lockheed  Missiles  and  Space  Co.,  Inc. 

ABS:  The  PNN  (probabi 1 ist 1c  neural  network)  is  a  three-layer 
feed-forward  network  that  uses  sums  of  Gaussian 
distibutions  to  estimate  the  pdf  for  a  training  data  set. 
This  trained  network  can  then  be  used  to  classify  new  data 
sets  and  to  provide  a  probability  associated  with  each 
classification.-  The  PNN  has  been  applied  successfully  to 
two  separate  ELINT  emitter  correlation  problems 
(hul 1 -to-emi tter  and  land-based  emitter  correlation).  Each 
of  these  applications  achieved  a  high  degree  of  accuracy 
in  Identifying  the  correct  emitter  among  many  possible 
emitters,  at  an  extremely  fast  rate  (about  2(X>.(>00  times 
faster  than  a  standard  back-propagation  neural  network). 
PNN  also  shows  great  potential  for  solving  other 
surveillance-analysis  problems:  an  application  to  a 
battle-outcome  prediction  problem  is  described.  89/00/00 
90A30749 


UTTL:  The  Adaptive  Network  Cognitive  Processor 

AUTH:  A/EOSON.  BRUCE;  B/TURNER.  CHERYL;  C/MYERS.  MICHAEL: 
D/SIMPSON,  PAT  PAA:  A/(USAF.  Avionics  Laboratory. 
Wnght-Patterson  AFB.  OH);  C/(TRW  MEAD  AI  Center.  San 
Diego.  CA);  D/(VERAC.  Inc.,  San  Diego,  CA)  IN  AAAIC 
'88  -  Aerospace  Applications  of  Artificial  Intelligence; 
Proceedings  of  the  Fourth  Annual  Conference,  Dayton.  OH, 
Oct.  25^27,  1988.  Volume  1  (A90-30226  12-59).  Xenia.  OH. 
Dayton  SIGART.  1968,  p.  133-143. 

ABS:  The  Adaptive  Network  Cognitive  Processor  (ANCP)  project  is 
an  experiment  in  the  use  of  adaptive  network  systems  to 
capture  the  cognitive  processes  used  for  deploying 
electronic  countermeasures  by  a  fighter  aircraft  in  an 
electronic  warfare  threat  environment.  A  functional 
architecture  was  developed  and  Initially  implemented  using 
the  Mark  HI  neurocomputer.  The  main  capabilities  of  the 
ANCP  demonstrated  were:  internal  modeling  of  the  threat 
environment  (Field  Interaction  Net),  adaptive  flight  route 
planning  (Gradient  Descent),  reflexive  threat  response 
(Feed  Forward  Net)  augmented  with  a  reflective  or  expert 
threat  response  (Fuzzy  Cognitive  Map)  in  unfamiliar 
situations  (Confidence  Filter),  on-board  recording  of 
unfamiliar  situation/expert  response  for  later  retraining 
(Back  Error  Propagation)  as  a  reflexive  response,  and 
initial  training  with  a  Learning  Apprentice.  88/00/00 
90A3033i 


UTTL.  Optoelectronic  implementations  of  neural  networks 

AUTH.  A/PSALTIS.  DEMETRI:  B/YAMAMURA.  ALAN  A.;  C/LIN.  STEVEN; 
D/GU.  XIANG-6UANG;  E/HSU.  KEN  PAA:  0/Ua11fornia 
Institute  of  Technology,  Pasadena):  E/(NatlonaI  Chiao 
Tung  University,  Hsinchu,  Republic  of  China)  IEEE 
Communications  Magazine  (ISSN  0183-8804),  voi.  27,  Nov. 
1989,  p.  37-40,  71.  Research  supported  by  OARPA.  USAF.  and 
U.S.  Army. 

The  ability  of  optical  systems  to  provide  the  massive 
interconnections  between  processors  required  In  most 
neural  network  models,  which  constitutes  their  chief 
advantage  for  such  applications,  is  discussad.  focusing  on 
holography.  Be;,au8e  of  tha  assentlal  nonlinearity  of  the 
holographic  connac'rions,  nonlinear  procasslng  elements  are 
needed  to  perform  complex  computations.  The  use  of  GaAs 
hybrid  optoelectronic  processing  elements  is  examined. 

GaAs  It  an  excallant  material  for  this  purposa.  since  it 
can  be  used  to  fabricate  both  fast  electronic  circuits  and 
optical  sources  and  detectors.  It  is  shown  how  a  complete 
hybrid  neural  computer  can  ba  implemented  using  available 
technology  developed  for  conventional  computing.  An 
experimentally  demonstrated  network  In  which  optics  plays 
an  even  larger  role  is  described. 

RPT#:  AD-A217133  69/11/00  90A22S06 


UTTL;  Information  theory,  complexity,  and  naurai  networks 
AUTH:  A/ABU-MbSTAFA,  YASER  S.  .PAA:  A/(Ce1 ifornia  Institute  of 
Technology,  Pasadena)  IEEE  Communications  Magazine  (ISSN 
0163-6804),  voi.  27.  Nov.  1989.  p.  25-28,  81. 

ABS:  Some  of  the  mam  rtsults  in  the  mathefnatical  evaluation  of 
naurai  networks  as  information  processing  systems  are 
discussed.  The  basic  o^ration  of  feedback  and 
feed-forward  neural  networks  is  described.  Their  memory 
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capacity  and  computing  powar  are  consldei ed.  The  concept 
of  learning  by  example  as  it  applies  to  neural  networks  Is 
examined:  ^89/11/00'  90A22504 


UTTL.  Automatic  target  recognition  on  the  connection 
machine  '  ,  , 

AUTH:  A/SUCHANAN«  J.  ROBERT  PAA:  A/(Uohns  HopklnS  University, 
laurel.  MO),  Johns  Hopkins  API  Technical  Digest  (ISSN 
0270-5214):  vol .  10.  July-Sept.  1989;  p.  208-215. 

ABS:  Automatic  target  recognition  (ATR)  is  a  computationally 

Intensive  problem  that  benefits  from  the* abi 1 ities  of  the 
Connection  Machine  (CM),  a  massively’ paral lei  computer 
used  for  data- level  parallel icomputlng. 'The -large 
computational  resources-of  the  CM  can  efficiently  handle 
an  approach  to  ATR  that  uses  parallel  stereo-matching  and 
neural -network  algorithms.  Such^an  approach  shows  promise 
as  an  ATR  system  of  satisfactory  performance;  89/09/00 
90A 11938 


UTTl:  Applications  of  neural  networks  to  avionics  systems 
AUTH:  A/SEIDMAN.  ABRAHAM  N.  PAA:  A/<Northrop  Corp. .  Aircraft 
Olv..  Hawthorne.  CA)  AIAA  Computers  in  Aerospace 
Conference.  7th.  Monterey.  CA.  Oct.  3-5.  1989.  11  p 
A8S  The  application  of  neural  networks  Is  discussed  as  a 

method  of  solution  to  a  number  of  outstanding  problems  in 
aircraft  avionics.  The  areas  of  application  of  artificial 
neural  networks  to  avionics  dealt  with  are  (i)  target 
selection  and  (2)  attack  planning/steering.  The  target 
selection  Is  approached  by  the  application  of  a 
feed-forward,  backpropagation  network.  The  attack 
plannlng/steering  is  approached  by  a  new  type  of  parallel 
processing  neural  network. 

RPT4  AIAA  PAPER  89-3093  89/10/00  90A 10627 


UTTl*  A  comparison  of  CMAC  neural  network  and  traditional 
adaptive  control  systems 

AUTH  A/KRAFT.  L.  CORDON;  B/CAMPACNA.  DAVID  P.  PAA  B/(New 
Hampshire,  University,  Durham,  NC)  IN*  1989  American 
Control  Conference,  8th.  Pittsburgh,  PA,  June  21-23.  <089. 
Proceedings.  Volume  1  (A89-S3951  24-63).  New  York. 
Institute  of  Electrical  and  Electronics  Engineers.  1989, 
p.  884-889. 

ABS  A  neural -network-based  controller  similar  to  the 

cerebellar  model  arithmetic  computer  (CMAC)  method  of 
Miller  et  a1,  (1987)  Is  compared  to  a  self-tuning 
regulator  and  a  lyapunov-based  model  reference  controller 
The  three  control  algorithms  are  tested  on  exactly  the 
same  control  problems.  Results  are  obtained  when  the 
system  being  controlled  Is  linear  and  noise-free  when 
noise  IS  added  to  the  measurements,  and  when  a  nonlinear 
system  is  controlled.  Comparisons  made  with  respect  to 
closed-loop  system  stability,  speed  of  adaptation,  noise 
rejection,  robustness,  the  number  of  required 
calculations,  and  system  tracking  performance  indicate 
that  the  neural -network  approach  exhibits  the  potential 
for  solving  some  of  the  problems  that  have  plagued  more 
traditional  adaptive  control  systems.  89/00/00 
89A53996 


UTTL  Adaptive  pattern  recognition  and  neural  networks 

AUTH  A/PAO,  YOH-HAN  PAA:  A/(Case  Western  Reserve  University. 
Cleveland.  OH)  Reading,  MA,  Addlson-Vesley  Publ ishing 
Co. .  Inc. .  1989,  327  p. 

ABS  The  application  of  neural -network  computers  to 

pattern-^recognit  ton  tasks  is  discussed  in  an  Introduction 
for  advanced  students.  Chapters  are  devoted  to  the  nature 
of  the  pattern-recognition  task,  the  Bayesian  approach  to 
the  estimation  of  class  membership,  the  fuzzy-set 
approach,  patterns  with  nonnumeric  feature  values, 
learning  discriminants  and  the  generalized  perceptron, 
recognition  and  recall  on  the  basis  of  partial  cues, 
associative  memories,  self-organizing  nets,  the 
funct lonal -I  Ink  net.  fuzzy  logic  In  the  linking  of 
symbolic  and  subsymbol ic  processing,  and  adaptive  pattern 
recognition  and  its  applications.  Also  included  are 
C-tanguage  programs  for  (1)  a  generalized  delta-rule  net 
for  supervised  learning  and  (2)  uhsupervlsed  learning 
based  on  the  discovery  of  clustered  structure.  89/00/00 
89A51326 


UTTl:  Microwave  diversity  imaging  and  automated  target 
Identification  based  on  models  of  neural  networks 

AUTH-  A/FARHAT,  NABIl  H.  PAA:  A/{Pennsy1 vanle.  University, 

Philadelphia)  IEEE,  Proceedings  (ISSN  0018-9219).  vol. 
77.  May  1969,  p.  670-681.  Research  supported  by  OARPA. 
USAF,  U.S.  Army,  and  NSF. 

ABS  It  is  shown  that  collective  nonlinear  signal  procsssing 
based  on  models  of  neural  networks  combined  with  the  use 
of  suitable  target  signatures,  offers  the  promise  of 
robust  suptrresoWed  target  identif icetion  from  partial 
Information.  Results  are  presented  of  numerical 
simulations  using  a  neuromorphic  processor,  where  the 
neural  net  performs  aimultaneousiy  the  functions  of  data 
storage,  processing  and  recognition.  The  results 
demonstrate  correct  Identification  from^as  low  as  10 
percant  of  a  full  sinogram  represantattons  darived  from 
real  data  collected  In  an  anecholc  chamber  environment  for 
three  test  tarots  (seal#  models  of  B'52.  AWAC,  and  Space 
Shuttle)  and  taught  to  the  network.  Practical 
considerations  and  extensions  to  raal  systems  are  briefly 
diacutsed.  89/05/00  89A49106 


UTTl:  A  unified  systolic  architecture  for  artificial 
neural  networks 

AUTH’  A/KUNO,  $.  Y.;  B/HWANQ,  U.  N;  PAA:  A/(Princeton 

University,  NJ):  B/(Southern  Cal ifornia:  University,  Los 


Angeles,  CA), ,  Journal. of  Parallel  and  Distributed 
c6mputJng.,(ISSN  0743-7315).  vol.  6',  Apr.n  1989,  p. 

358-387:  Research  supported  by  SDIO. 

ABS:  A.progra^able  ring  systol  ic  array,  is  presently  developed 
bh- the^basis  ofsa  generic  iterative  model  encompassing 
artificial  neural  networks;  s1ngle*layer  feedback 
networks,  multilayer  feedforward  networks',  hierarchical 
competitive  networks,  and  even  some  probabilistic  models. 
The-architecture  thus  obtained  maKlmlzes  VIST's  advantages 
in-tornis-pf  .intensive  and  pipel  Ined  computing,  while 
circumventing  the  conventional  limitation  on 
communication:  it-  is  therefore, recommended. as  a  promising 
structural  basis  for  a  universal  neurocomputer 
architecture.  89/04/00-  -89A41735 


UTTl:  Back  propagation- fal Is  to  separate  where  perceptrons  , 

succeed- 

AUTH:  A/BRADY, -MARTIN  1. :  B/RAGHAVANv  RAQHU;  C/SUWNY.  JOSEPH  1 

PAA:  B/(Lockheed  Corp-,,  Palo  Alto;  CA);  C/(Virglnia  i 

Polytechnic  Institute  and  State  University.  Blacksburg) 

IEEE' Transactions  on  Circuits  and  Systems  (ISSN  { 

0098-4094),  vol;.  36.  May  1989,  p.  665-674.  Research 

supported  by/.lockheed'Corp'.  { 

ABS.  It  is  widely* believed  that  the  back  propagation  algorithm 
ifi  neural,  networks,  for -tasks  such  as  pattern 

classification,  overcomes  the  limitations  of  the  \ 

perceptron.  The  authors  construct  several  counterexamples  ; 

to  this  belief.  They  also  construct  linearly  separable  : 

examples  which  have  a  unique  minimum  which  fails  to  i 

separate  two  families  of  vectors,  and  a  simple  example 
with  four  two-dimensional  vectors  in  a  single-layer 
network  showing  local  minima  with  a-  large  basin  of  | 

attraction.  Thus,  back  propagation  is  guaranteed  to  fail  ? 

in  the  first  example,  and  likely  to  fail  in  the  second  | 

example.  It  is  shown  that  even' multi layered  (hidden-layer)  - 

networks  can  also  fall  in  this  way  to  classify  linearly  I 

separable  problems.  Since  the  authors'  examples  are  all 
linearly  separable,-  the  perceptroh  would  correctly  | 

classify  them.  The  results  disprove  the  presumption,  made  | 

in  recent  years,  that,  barring  local  ninima.  back  ) 

propagation  will  find  the  best  set  of  weights  for  a  given  1 

problem.  89/05/00  89A41634  1 


UTTl:  Multitarget  tracking  with  cubic  energy  optica)  | 

neural  nets  | 

AUTH.  A/BARNARO,  ETIENNE:  B/CASASENT,  OAVIO  P.  PAA  | 

8/(Carnegie-Hellon  University,  PittsDurgn,  PA)  Applied  * 

Optics  (ISSN  0003-6935).  vol.  28.  Feb.  15.  1989.  p.  1 

791-798.  Research  supported  by  SDIO.  I 

ABS.  A  neural  net  processor  and  its  optical  realization  are  ^ 

described  for  a  multitarget  tracking  application.  A  cubic  ; 

energy  function  results  and  a  new  optical  neural  processor  i 

is  required.  Initial  simulation  data  are  presented  7 

89/02/15  89A32825  t 


UTTl.  Supervised  learning  of  probability  distributions  by 
neural  networks  i 

AUTH:  A/6AUM.  ERICB.:  B/WIICZEK,  FRANK  PAA.  A/(Cd 1 1 f ornia 

Institute  of  Technology,  Jet  Propulsion  laboratory.  j 

Pasadena):  B/(Harvard  University,  Cambridge.  MA)  CORP; 

Jet  Propulsion  lab  ,  California  Inst,  of  Tech.,  Pasadena.;  r 

Harvard  Uni V. ,  Cambridge.  MA.  IN.  Neural  information  ^ 

processing  systems:  Proceedings  of  the  First  IEEE 
Conference.  Denver,  CO,  Nov.  8-12,  1987  (A89-29002  11-63).  f 

New  York,  American  Institute  of  Physics,  1968,  p.  52-61 
Research  supported  by  OARPA.  * 

ABS  Supervised  learning  algorithms  for  feedforward  neural 
networks  are  investigated  analytically.  The 
back-propagation  algorithm  described  by  Werbos  (1974), 

Parker  (1985).  and  Rumelhart  et  al.  (1986)  is  generalized 
by  redefining  the  values  of  the  input  and  output  neurons 
as  probabilities.  The  synaptic  weights  are  then  varied  to 
follow  gradients  in  the  logarithm  of  likelihood  rather 
than  in  the  error.  This  modification  is  shown  to  provide  a 
morn  rigorous  tneoretlcal  basis  for  the  algorithm  and  to 
permit  more  ac<.urate  predictions.  A  typical  application 
Involving  a  medical -diagnosis  expert  system  is  discussed. 

88/00/00  B9A29008 


UTTl:  Spaceplahes  astronaut's  associate  control  server 

AUTH.  A/HONG,  ROBERT  PAA:  A/(Grumman  Aerospace  Corp.,  Grumman 
Aircraft  Systems  Olv.,  Bethpage,  NY)  IN*  IEEE  Conference 
on  Decision  and- Control ,  27th.  Austin,  TX,  Dec.  7-9,  1988. 
Proceedings.  Volume  1  (A89-2849i  11-63).  New  York. 
Institute  of  Electrical  and  Electronics  Engineers.  Inc  . 
1988,  p.  149-’154. 

ABS:  The  author  addresses  the  extension  of  the  DARPA/US  Air 
Force  Pllotfs  Associate  program  to  the  astronaut's 
associate  application,  and  particularly  the  control  server 
aspect.  Some  representative. .techniques  for  implementing 
this  system  are  discussed.  Artificial  intelligence  (A!) 
and  neural  natworks  are  applied  syhergistically  to  achieve 
an  optimum  system.  The  author  examines  such  issues  as 
adsptiva  aiding,  parformahee. saeking  control .  qualitativa 
reasoning,  naural  natworks  gradiant  methods  for 
conhectfonist  networks, _and  neural .mschinery  for 
spacecraft  control .  88/00/00  89A28506 


UTTl:  Autonomous  raconf iguration  of  sansor  systems  using 
neural  nets 

AUTH:  A/JAKueJw:c2,  OlEG  G.  PAA:  A/(New  York,  State 

University.  Buffalo)  IN:  Sensor .fusion;  Proceedings  of 
the. Meeting,  Orlando,  Fli  Apr. -,4r6,J  1988  (A89-2695i 
10-63).  Bellingham.  WA. -Society,  of  Photo^Opticel 
Instrumentation  Engineers.  I986.'p.  197^203, 

ABS;  Theappricstton. of  neural  networks. to  autbhdmous  agents 
(ihteUigent  robots  oiMretihg  in  isolated  locations)  is 
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discussed,  and  archltacturas  for  lnpl«m«ntlno 
s«lf -r«pa1rlno  sansor  and  idant tf feat  ion  aystems  aboard 
autdnonoua  aganta  ara  propoaad.  Tha  axampla  of  a 
four*layar  visual  aystam  which  idantfflaa  visual  objacts 
Is  conaldarad  In  which  aach  procaasor  connactlon  la 
asslgnad  a' walght  attrlbuta.  It  la  shown  that  whan  one  of 
tha  units  bacbmas  tnoparatlvaj  nalghborlng  datactors  in 
that  layar  May  bk  usad  to  raprogram  tha  walghts  connacting 
surviving  units  In  ordar  to  rastora  functionality. 

«^/00/00  89A2697S 


UTTLi  Multlsahsor  intagratlon  and  fusion  *  laauaa  and 
approachas 

AUTH:  A/LUO.  REN  C.*;  B/KAY.  ^HAEt  0.  PAAr  B/{North  Carolina 
Stata  Unlvarslty.  Ral.algn)'  IN:  Sansor  fusion; 

Procaadlhgs  bf.tha  Maatlhg,  Orlando,  FL.  Apr.  4<6.  1988 
(A89*26991  10*63).  Balllhghan,  WA.  Sdclaty  of 
Photo-Optical  Instruniantatlon  Englnaars,  1988,  p.  42-49. 

ABS:  Issuas  concarnlng  tha  affactlva  intagratlon  of  nultlpla 
sansors  Into  tha  oparatlon  of  intalUgaht  systena  are 
prasantad,  and  a  daacrlptlon  of' soma  of  tha  ganaral 
paradigms  and  mathodologlas  that  addrass  this  problam  ts 
givan.  Hultlsansor  intagratlon,  and  tha  ralatad  notion  of 
multlsahsor  fusion,  ara  daflnad  and  distlngulshad.  Tha 
potantlal  advantagas  and  problams  rasultihg  from  tha 
Intagratlon  of  Information  from  multipla  sansors  ara 
discussad.  88/00/00  89A269S7 


UTTt:  Sansor  fusion;  Procaadlngs  of  tha  Maatlng.  Orlando, 
FL.  Apr.  4-6.  1968 

AUTH;  A/WEAVER.  CHARLES  6.  PAA:  A/CHonaywall.  Inc.. 

Elactro-Optics  Olv.,  Laxlngton.  MA)  Maatlng  sponsorad  by 
SPIE.  Balllngham,  WA,  Soclaty  of  Photo-Optical 
Instrumahtatloh  Englnaars  (SPIE  Procaadlngs.  Voluma  931). 
1988.  218  p.  For  Individual  Itams  saa  A89-26952  to 
A89-26975. 

ABS:  Papars  ara  prasantad  on  muttlsansor  targat  dataction  and 
classification,  a  gaomatrlc  approach  to  multisansor 
fusion,  and  optimal  and  auboptinat  dtstrlbutad  daclslon 
fusion.  Also  constdarad  ara  Information  fusion 
mathodology.  thaoratlca)  approachas  to  data  association 
and  fusion,  and  adaptlva  control  of  multisansor  systems. 
Othar  topics  Include  targat  acquisition  and  tracking  in 
tha  laser  docking  sansor,  a  neural  network  architecture 
for  evidence  combination,  an  algorithm  for  sensor  fusion, 
and  tha  application  of  order  statistic  filters  to 
detection  systems. 

APT#:  i;PIE-93t  88/00/00  89A26951 


UTTL.  PSRI  target  racognltion  in  range  imagery  using 
neural  networks 

AUTH;  A/TROXEL.  $.  E-i  B/ROQERS.  S.  K.;  C/KAeRI$KY.  M. : 

O/MILLS,  U.  P.  PAA:  0/(u$AF.  Institute  of  Technology, 
wnght-Patterson  AFB.  OH)  IN:  Digital  and  optical  shape 
representation  and  pattern  recognition:  Proceedings  of  the 
Meeting,  Orlando.  FL.  Apr.  4-6.  1988  (489-23526  08-63). 
Bellingham.  WA,  Society  of  Pnoto-Opt leal  Instrumentation 
Engineers,  1988,  p.  295-301. 

ABS.  A  method  for  classifying  objects  invariant  to  position. 

rotation,  or  scale  Is  presented.  Objects  to  be  classified 
were  multifunction  laser  radar  data  of  tanks  and  trucks  at 
various  aspect  angles.  A  segmented  Doppler  image. was  used 
to  mask  the  range  image  into  candidate  targets.  Each 
target  was  then  compared  to  stored  templates  representing 
the  different  classes.  A  neural  network  was  used  to 
perform  the  classification  with  an  accuracy  near  100 
percent.  The  neural  network  used  in  this  study  was  a 
multilayer  perceptron  using  a  back  propagation  algorithm. 
88/00/00  89A23556 


UTTL  Neural -network  techology  and  its  applications 

AUTH:  A/ROTH.  MICHAEL  W.  PAA;  A/(John8  Hopkins  University. 
Laurel,  MO)  Johns  Hopkins  APL  Technical  Digest  (ISSN 
0270-5214).  vol.  9.  July-Sept.  1988.  p.  242-253. 

ABS.  This  paper  discusses  recent  developments  fn  neural -network 
technology  in  the  areas  of  models,  algorithms,  and 
special-purpose  computational  hardware  Special  attention 
is  given  to  the  applications  of  neural -network  technology 
in  such  areas  as  solutions  of  complex  optimization 
problems,  communication  modems,  pattern  recognition,  and 
enginaaring  problems  in  control  systems.  88/09/00 
89A 18786 


UTTL;  Artificial  neural  network  approaches  to  target 
recognition 

AUTH.  A/BOWMAN.  CHRISTOPHER  PAA.  A/(BaU  Corp. ,  Ball  Systems 
Engineering  Dlv.,  San  Diego.  CA)  IN;  AIAA/IEEE  Digital 
Avionics  Systems  Conferenca,  8th,  San  Jose,  CA,  Oct. 
17-20.  1988,  Technical  Pupers.  Part  2  (A89-18051  05-06). 
Washington.  6c.  American  Institute  of  Aeronautics  and 
Astronautics.  1988.  p.  847-857. 

ABS’  Artificial  Neural  Network  (ANN)  technology  ts  being 

successfully  applied  to  a  variety  of  pattern  recognition 
problems.  The  ANN  discovers  features  itself  based  upon 
user  training.  Trained  ANN'S  settle  fast  to  good 
solutions,  thereby  providing  cost  effective. self -learned 
pattern  recognition.  This  paper  detcribes  what  ANN'a  ara 
and  how  they  are  trained.  A  taxonomy  la  given  along  with 
ANN  dynamics  and  training  e^ationa.  ANN  system 
development  methodology  is  suHImarlzed,  An  application  of 
ANN'S  to  stereo  image  lutchlng  ANN  and  multisansor  target 
recofKiition  avionics  Is  prasantad. 

RPTS:  AIAA  paper  88-4029  88/00/00  89A18179 


UTTL:  Neural -network  implementation  of  a  scan-to-scan 
correlation' algorithm' 

AUTH:  A/MCCURRY;  MAX  E.  PAA;  A/(U.S.  Army.  Advanced. Technology 
Directorate,  Huntsville.  AL)  IN:  High  speed  computing: 
Procaedfngs  of  the  Meeting,  Lbs  Angeles,  CA,  Jan.  11.  12. 
1988  (A89-144S1  03-63).  Bellingham,  WA.\Socfety  of 
Phbto-Optlcal  Instrumentation  Engineers,  1988.  p.  85-67. 

ABS:  This  paper  presents  a  haural -network  approach  to  the 

problem  of  multltargut  tricking.  The  problem  Is  foririlated 
analytically  In  terms  of  desired  optima  and  constraints 
that  make  It  suitable  for  solution. using  the 
neural-network  formal  Ism  of  Hopfleld-Tank.  The  results  of 
computer  simulations  of  a  network  designed  to  solve  the 
problem  are  presented.  88/00/00  a9A14460 


UTTL:  Generalization  of  back-propagation  to  recurrent 
neural'  networks 

AUTH:  A/PlNEDAi. FERNANDO  J.  .PAA:  A/(Johns  Hopkins  University, 
Laurel, MO)  Physical  Review  Lattera  ( ISSN  0031*9007) , 
vol.  59i  Nov.  9.  1987,  p;  2229-2232: 

ABS:  An‘'adaptlve' neural  natwork  with  asymmetric  connections  is 
proposed  that  Is  related  to  the  Hopffeld  (1984)  network 
with  graded  neurons.  The  present  back-pr^agatlon 
algorithm  uaes'a  recurrent  genaral Izat ion  of  the  delta 
rule  of  Rumelhart  at  al.  (1986)  to  adaptively  modify  the 
synaptic  weights.  The  network  is  architecturally  simpler 
than  the  master/slave  natwork  of  Lapedes  and  Farbar 
(1966).^  and  it  vectorlzas  naturally  because  the  units  are 
homogeneous.  67/11/09  88A18289 


UTTL:  Engineering  cybernetics 

AUTH.  A/GLORIOSO,  R.  M.  PAA;  A/(Massachusett8,  University, 

Amherst,  Matsr)  Englewood  Cl tffs;  N.J.,  Prentice-Hall. 
Inc, .  1979.  270  p. 

ABS;  The  present  work  examines  the  concepts  of  adaptation. 

learning,  self-brganlzatlon,  self-repair^  game  playing  by 
machlnea,  pattern  recognition,  and  artificial 
intelllgance.  along  with  some  applications  of  cybernetics 
which  have  emerged  so  far.  The  discussion  covers 
fundamental  computer  organization  and  behavior,  symbols 
and  declalbhs  Ih  machlnas.  information,  logic,  automata, 
and  search  techniques.  Specific  examples  of  adaptive, 
learning,  and  self-organizing  systems  as  appllad  to 
control  and  communications  are  provided.  The  principles  of 
redundant  daslgn,  fault  masking,  and  repair  for  creating 
reliable  systems  are  discussed.  Single  and  multilevel 
threshold  logic  synthesis  are  outlined  along  with 
descriptions  of  the  Adeline  (adaptive  linear  element). 
Medal ine  .(multiple  adaptive  linear  element),  and  the 
perceptron.  Partleuler  attention  is  devoted  to  pattern 
recognition,  where  the  various  aspects  of  the  problem 
Including  systems  for  both  optical  and  acoustical  pattern 
recognition,  feeture  extraction,  and  pattern 
classification  are  defined  and  analyzed.  75/00/00 
76A 19444 


UTTL:  Experiments  in  imege  recognition  with  the  aid  of 
expanding  networks 

AUTH:  A/GLAOUN.  V.  P. ;  B/MAZAEVA.  S.  P. 5  C/SAVA,  I.  0. 

Problemy  Blonlkl.  no.  6,  1971,  p.  63-69.  In  Russian. 

ABS.  An  Image  recognition  learning  algorithm  ts  proposed  for  a 
type  of  neural  nets  Introduced  by  Qladun  (1970)  and  called 
the  expanding  type.  According  to  this  definition,  such 
neural  nets  are  progressively  built  by  spare  back-up 
elements  during  the  process  of  learning.  The  elements  of 
such  nets  are  Identified  as  active  Inputs,  receptors., 
associative  elements  and  recognizers,  connected  by 
transmitting  and  forbidding  couplings  Into  a  single  body 
Computer  experiments  are  described  to  Illustrate  the  work 
of  this  learning  algorithm.  71/00/00  73A15794 


UTTL:  Memory-based  reasoning  for  advanced  launch  system 
operations 

AUTH:  A/MYLER.  HARLEY  R.;,  B/DUBOIS.  DEAN  A.  CORP;  University 
of  Central  Florida.  Orlando.  CSS:  (Dept,  of  Computer 
Engineering.)  in  Its  KSC-NASA/UCF  Cooperative  Agreement 
Research  Projects  17  p  (SEE  N91-70698  09-61)  91/00/00 

91N70701 


UTTL.  Cascading  a  systolic  array  and  a  feadforward  naural 
network  for  navigation  and  oDstacle  avoidance  using 
potential  fields 

AUTH;  A/PLUMER.  EDWARD  S.  CORP:  Stanford  Univ. .  CA.  CSS:  ( 
Dept,  of  Electrical  Enginaaring.) 

ABS;  A  technique  Is  developed  for  vehicle  navigation  and 

control  in  the  presrnce  of  obstacles.  A  potential  function 
was  devised  that  peeks  at  the  surface  of  obstacles  and  has 
Its  minimum  at  the  proper  vehicle  destination.  This 
function  is  comH^uted  using  a  systolic  array  and  is 
guaranteed  hot  to  have  local  minima,  A  faadfoward  naural 
network  la  then  used  to  control  the  steering  of  the 
vehicle  using  local  potential  field  information.  In  this 
case,  the  vehicle  is  a  trailer  truck  backing  up.  Previous 
work  has  demohstreted  the  capability  of  a  neural  network 
to  control  steering  of  such  a  trailer  truck  backing  to  a 
loading  platform,  but  without  obataclaa.  Now,  tha  naural 
network  waa  able  to  learn  to' navigate  a  trailer  truck 
around  obstacles  while  backing  .^toward  Its  destination.  Tha 
network  Is  trained  In  an  obstacle  free  space  to  follow  the 
negative  gradient  of  tha  field,  after  which  the  natwork  is 
able  to  control  and  navl^ta  tha  truck  to  its  targat 
destination  In  a  apace  of  obstaejea  which  may  be 
atatiohary  or  movable. 

RPT#:  NASA-CR-1775V9  4-91066  NAS  1.26:177575  91/02/00 

91N19771 
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UTTt.:  Neural  networks  in  nonlinear  aircraft  control 

AUTH:  A/UINSe,  OENNJS  CORP;  Princeton  Univ..  NJ  CSS‘  ( 

Dept,  of , Mechanical  and. Aerospace  Engineering. )  In  NASA. 
Langley  Research  Center,  Uoint  university  Program  for  Air 
Transportation  Research.  1989-1990  p  1Sl-i6l  (SEE 
N91- 19024  11?01) 

ASS'  Recent  research  indicates  that  artificial  neural  networks 
offer  interesting  learning  or  adaptive  capabl.l  ities.  The 
current  research  focuses  on  the  potential  for  application 
of  neural  networks  in  a. nonlinear  aircraft  control  law. 

The  current  work  has  been  to  determine  which  networks  are 
suitable  for  such.an  appi Icat lon^and  how  they  will  fit 
into  a  nonlinear  control  law,  90/13/00  91N19037 


UTTL'  Neural  networks  as  a  control  methodology 
AUTH:  A/MCCUILOUQH,  CLAIRE  L  CORP.  Alabama  UnIv. .  Huntsville 
CSS:  (Dept,  of  Electrical  and  Computer  Engineering. )  In 
Alabama  Univ..  Research  Reports:  1990  NASA/aSEE  Summer 
Faculty  fellowship  Program  8  p  (SEE  N91-18967  tO’99) 

ASS.  While  conventional  computers  must  be.,prdgrammed  in  a 

logical  fashion  by  a  person  who  thoroughly  understands  the 
task  to  be  performed,  the  motivation  behind  neural 
networks  is  to  develop  machines  which  can  train  tnereselves 
to  perform  tasks,  using  available  information  about 
desired  system  behavior  and  learning  from  experience. 

There  are  three  goals  of  this  fellowship  program.  (1)  to 
evaluate  various  neural  net  methods  and  generate  computer 
software  to  Implement  those  deemed  most  promising  on  a 
personal  computer  equipped  with  Hatlab;  (2)  to  evaluate 
methods  currently  In  the  professional  literature  for 
System  control  using  neural  nets  to  choose  those  most 
applicable  to  control  of  flexible  structures;  and  (3)  to 
apply  the  control  strategies  choson  in  (2)  to  a  computer 
simulation  of  a  test  article,  the  Control  Structures 
Interaction  Suitcase  Demonstrator,  which  is  a  portable 
system  consisting  of  a  small  flexible  beam  driven  by  a 
torque  motor  and  mounted  on  springs  tuned  to  the  first 
flexible  mode  of  the  beam.  Results  of  each  are  discussed. 
90/10/00  9  IN  18997 


UTTL.  Optimal  control  by  neural  networks 
AUTH  A/BANKS.  S.  P  5  8/MAfiRISON.  _R .  F.  CORP.  Snef f leld  Uni v 
(England).  CSS.  (Dept,  of  Control  Engi neenhg.  ) 

A6S  A  neural  network  for  the  implementation  of  a  nonlinear 
optimal  controller  is  developed,  based  on  an  energy 
minimization  principle  The  theory  is  applicable  to  any 
nonlinear  problem  with  a  quadratic  cost  functional, 
although  it  would  be  easy  to  extend  tt  to  non  quadratic 
functionals.  A  simple  example  of  a  scalar,  linear, 
quadratic  problem  is  presented. 

RPT»  RR-399  ETN‘91*98527  90/06/14  9tNl5797 


UTTl  Massively  parallel  network  architectures  for 
automatic  recognition  o*  visual  speech  signals 
AUTH  A/SEONOWSKI.  TERRENCE  J- .  B/GOLOSTEIN.  MOISE  CORP 
Johns  Hopkins  Univ.,  Baltimore,  MO. 

ABS  This  research  sought  to  produce  a  massively  parallel 

network  architecture  that  could  interpret  speech  signals 
from  video  recordings  of  human  talkers.  The  project's 
results  are  summarized  (1)  A  corpus  of  video  recordings 
from  two  human  speakers  was  analyzed  with  image  processing 
techniques  and  used  as  the  data  for  this  study;  (2)  It  was 
demonstrated  that  a  feedforward  network  could  be  trained 
to  categori;:e  vowels  from  fnese  talkers  (The  performance 
was  comparable  to  thot  of  the  nearest  neighbors  techniques 
and  to  trained  humans  on  the  same  data):  (3)  A  novel 
approach  was  developed  to  sensory  fusion  by  training  a 
network  to  transfo’'ffl  from  facial  images  to  short-time 
soectral  amplitude  envelopes.  This  information  can  be  used 
to  Increase  the  signal  to  noise  ratio  and  hence  the 
performance  of  acoustic  speech  recognitJon  systems  In 
noisy  environments;  and  (4)  The  use  was  explored  of 
recurrent  networks  to  perform  the  same  mapping  for 
continuous  speech.  Results  demonstrate  the  feasibility  of 
adding  a  visual  speech  recognition  component  to  enhance 
existing  speech  recognition  systems.  Such  a  combined 
system  could  oe  used  in  noisy  environments,  such  as 
cockpits,  where  improved  comraunicat ion  is  needed.  This 
demonstration  of  presymbol ic  fusion  of  visual  and  acoustic 
speech  signals  Is  consistent  with  the  current 
understanding  of  human  speech  perception, 

RPTx  AD-A226968  AF0SR-9O-O949TR  90/00/00  9IN14805 


UTTt'  Applications  of  neural  networks  to  adaptive  control 

AUTH  A/SCOTT.  RUSSELL  W. .  II  CORP;  Naval  Postgraduate  School. 
Monterey.  CA. 

ABS  Tne  amount  of  a  priori  knowledge  required  to  design  some 
modern' control  systems  is  becoming  prohibitive.  Two 
Current  methods  addressing  this  problem  are  robust 
control,  in  which  the  control  design  is  Insensitive  to 
errors  in  system  knowledge,  and  adaptive  control,  in  which 
the  control  law  is  adjusted  in  response  to  a  continually 
updated  model  nf  the  system.  This  thesis  examines  the 
application  of  parallel  distributed  processing  (neural 
networks)  to  the  problem  of  adaptive  control.  The 
structure  of  neural  networks  is  Introduced,  focusing  on 
the  Backpropagation  paradigm.  A  general  form  of  controller 
cansistant  with  use  in  neural  networks  Is  developed  and 
combined  with  a  discussion  of  Hnaar  least  squares 
parameter  estimation  technl^es  to  suggest  a  structure  for 
neural  network  adaptive^ controllers.  This  neural  network 
adaptive  control  structure  fs  then  applied  to  a  number  of 
estimation  and  control  problems  using  as  a  model  the 
longitudinal  motion  of  the  A-4  aircraft.  The  purpose  of 
this  thesis  IS  to  develop  and  demonstrate  a  neural  natwork 
adaptive  control  structure  consistent  with  adaptive 
control  theory 

RPT#'  AO-A225408  89/12/00  91N13938 


UTTL;  Target  detection  in  Gaussian  noise. using  artificial 
neural  systems 

AUTH:  A/SOLKA.  JEFFREY. L.;  B/ROGERS.  GEORGE  CORP:  Naval 

Surface  Warfare  Center,  Oahlgren,  VA.  csS.  (Strategic 
Systems  Dept.) 

ABS:  Radar  signal  processing  wItK  multilayered  perceptrons  was 
Investigated.  Networks  with  nc  hidden  layer  and  a  single 
hidden  layer  were  tested  on  field  collected  millimeter 
wave  target  returns  that  have  been  corrupted  with 
artificial  Gaussian  noise  at  a  signal  to  noise  level  of  3 
dB.  Performance  as  a  function  of  network  architecture  was 
characterized. 

RPT/r.  AD  A223983  NSWC/TR-'^O- 171  90/06/00  90N28770 


UTTL:  Analog  hardwat  for  learning  neural  networks 

AUTH  A/ESERHART.  SILVIO  P_.  PAA .  A/(0ot  Propulsion  Lab. . 

California  Inst,  of  Tech.,  Pasadena.)  .  CORP:  National 
Aeronautics  and  Space  Administration.  Pasadena  office.  CA. 

;  Jet  Propulsion  Lab..  California  Inst,  of  Tech.. 
Pasadena. 

ABS.  This  is  a  recurrent  or  feedforward  analog  neural  network 

processor  having  a  multi-level  neuron  array  and  a  synaptic 
matrix  for  storing  weighted  analog  values  of  synaptic 
connection  strengths  which  is  characterized  by  temporarily 
changing  one  connection  strength  at  a  time  to  determine 
Its  effect  on  system  output  relative  to  the  desired 
target.  That  connection  strength  is  then  adjusted  based  on 
the  effect,  whereby  the  processor  is  taught  the  correct 
response  to  training  examples  connection  by  connection. 

RPT#.  NASA-CASE-NP0M7664-1-CU  NAS  1 . 7 1 -NPO* 17664- 1 -CU 
US-PATENT-APPL-SN-463720  89/12/28  90N27384 


UTTL;  QMS  FDIR:  Initial  prototyping 
AUTH.  A/TAYLOR.  ERIC.W.;  B/HANSON.  MATTHEW  A  CORP*  Ford 

Aerospace  and  Communications  Cdrp. ,  Sunnyvale.  CA.  in 
NASA.  Lyndon  6.  Johnson  Space  Center,  Third  Annual 
workshop  on  Space  Operations  Automation  and  Robotics  (SOAR 
1989)  p  S45-S49  (SEE  N90-25503  19-59) 

ABS:  the  Space  Station  Freedom  Program  (SSFP)  operations 

Management  System  (QMS)  will  automate  major  management 
functions  which  coordinate  the  operations  of  onboard 
systems,  elements  and  payloads.  The  objectives  of  OMS  are 
to  Improve  safety,  reliability  and  productivity  while 
reducing  maintenance  and  operations  cost.  This  will  be 
accomplished  by  using  advanced  automation  techniques  to 
automate  much  of  the  activity  currently  performed  by  the 
flight  crew  and  ground  personnel.  oMS  requirements  have 
been  organized  Into  five  task  groups:  (i)  Planning. 
Execution  and  Replanning;  (2)  Data  Gathering. 

Preprocessing  and  Storage.  (3)  Testing  and  Training,  (4) 
Resource  Management;  and  (5)  Caution  and  warning  and  fault 
Management  for  onboard  subsystems.  The  scope  of  this 
prototyping  effort  falls  within  the  Fault  Management 
requirements  group.  The  prototyping  will  be  performed  in 
two  phases.  Phase  i  is  the  development  of  an  onboard 
communications  network  fault  detection,  isolation,  and 
reconfiguration  (FDIR)  system.  Phase  2  will  incorporate 
global  FDIR  for  onboard  systems.  Research  Into  the 
applicability  of  expert  systems,  object-oriented 
programming,  fuzzy  sets,  neural  networks  and  other 
advanced  techniques  will  be  conducted.  The  goals  and 
technical  approach  for  this  new  SSFP  research  project  are 
discussed  here,  90/03/00  90N25562 


UTTL.  A  comparison  of  two  neural  network  schemes  for 
navigat ion 

AUTH:  A/MUNRO.  PAUL  CORP  Pittsburgh  Univ  .  PA.  CSS.  (Dept, 
of  Information  Science.)  In  NASA,  Lyndon  B.  Johnson 
Space  Center.  Third  Annual  Workshop  on  Space  Operations 
Automation  and  Robotics  (SOAR  1989)  p  305-310  (SEE 
N90-25503  19-59) 

ABS.  Neural  networks  have  been  applied  to  * ^sks  in  several 
areas  of  artificial  intelligence,  including  vision, 
speech,  and  language  Relatively  little  work  has  been  done 
in  the  area  of  problem  solving.  Two  approaches  to 
path-finding  are  presented,  both  using  neural  network 
techniques  Both  techniques  require  a  training  period 
Training  under  the  back  propagation  (8PL)  method  was 
accomplished  by  presenting  representations  of  current 
position,  goal  position  pairs  as  input  and  appropriate 
actions  as  output.  The  Hebbian/interactive  activation 
(HIA)  method  uses  the  Hebblan  rule  to  associate  points 
that  are  nearby.  A  path  to  a  goal  is  found  by  activating  a 
representation  of  the  goal  in  the  network  and  processing 
until  the  current  position  is  activated  above  some 
threshold  level.  8PL.  using  back-propagat ion  learning, 
failed  to  learn,  except  in  a  very  trivial  fashion,  that  is 
equivalent  to  table  lookup  techniques.  HIA.  performed  much 
better,  and  required  storage  of  fewer  weights.  In  drawing 
a  comparison,  ft  is  important  to  note  that  back 
propagation  techniques  depend  critically  upon  the  forms  of 
representation  used,  and  can  be  sensitive  to  parameters  m 
the  simulations;  hence  the  BPL  technique  may  yet  yield 
strong  results.  90/03/00  90N25536 


UTTL:  A  comparison  of  two  neural  network  schemes  for 
navigation 

AUTH;  A/MUNRO,  PAUL  W.  CORP;  Pittsburgh  Oniv. ,  PA.  CSS;  ( 
Dept,  of  Information  Science.)  In  Texas  A&M  Univ.. 
NASA/ASEE  Summer  Faculty  Fellowship  Program* 1989.  Volume  2 
10  p  (SEE  N90-24985  18-80) 

ABSr  Neural  networks  have  been  applied  to  tasks  m  several 
areas  of  artificial  Intelligence,  (hcludihg  .vision, 
speech,  and  language;  .Relatively  little  work  has  been  done 
In  the  area  of  problem  solving;  Two  approaches  to 
path-finding  are  presented,  both  using  neural*  network 
techniques.  Both  techniques  require  a_ training  period. 
Training  under  the  back  propagation  (BPL)  method  was 
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accoiepl  Uh«cl  by  prasahting  raprasantat^ons  of  (currant 
position,  goal  position)  pairs  as  input  and  appropriata 
actions  as  output.  Tha  Habbian/ihtaractiva  activation 
(HjA)-math6d“usas  tha  Habbian  ruia  to  assoetata  points 
that  ara  naarby.  A  path  to  a  goal" fs ’found  by  activating  a 
raprasantatlon^df  tha  goat  in’^tha  natwork'ahd  procassing 
until  tha  currant  position  is  'activa'tad>abbva  soma 
thrashold  tavat.'BPU.  using  baek-prbpagatidnMaarning. 
failad  to  laarn.  axcapt  in  a. vary  trivial  fashion,  that  is 
aquiyatant  to  tabta  lookup  tachhiquas.  HtA,  parformad  much 
battar.-ahd  raquirad'storaga  of-fawar  waights.  In  drawing 
a  compafisbh,  it  is  important^to  nota  that- back 
pr^agation  tachnipuas  dapand  critically  upon  tha  forms  of 
raprasahtatioh  usad,  and  can  ba-tahsitiva  to  paramatars  in 
tha  simulations:  hanca.tha  BPL’ tachhiqua’^raay-yat  yiald 
strong  rasults.'  89/12/00  96N24g9t 


UTTL:  Naurofflorphic  optical  sighar  procassing'and  imaga 
undarstanding  for  automatad  targat  racognition 

AUTH:  A/FARKAT,  NABIL  H.  CORP:  Pahhsylvania  Univ. . 

Phi ladalphia. 

ABSr  Tha  goal  of  rasaarch  is  study  of  cbmputationand  laarning 
in  naural  hat  modals  and  damonstration  of  thafr  utility  in 
imaga  undarstanding  and  nauromorphie  information 
procassing  systems  for  ramota  sensing  and  target 
identification.  Tha  approach  to  achieving  this  goal  has 
two  facets.  One  is  combining  innovative  architectures  and 
methodologies  with  suitable  algorithms  to  exploit  existing 
and  emerging  photonic  technology  in  the  implementation  of 
large-scale  neurocomputars  for  use  in:  the  study  of 
complex  self-organising  and  learning  systems^,  fast 
solution  of  optimisation  problems,  feature  extraction, 
(formation  of ^object  representation),  and  pattern 
recognition.  The  second  facet  of  the  approach  is  to 
demonstrate  and  assess  the  capabil ities  of  heuromorpnic 
processing  in  solution. of  selected  inverse-scattering  and 
recognition  problems.  The  problem  studied  as  a  test  bed 
for  the  work  is  that  of  automated  radar  target  recognition 
because  of  the  existing  capabilities  and  expertise  in  this 
area. 

RPT#:  A0-A219827  EO/MO-89-i  89/12/00  90N23884 


UTTL;  Neural  networks  in  support  of  manned  space 

AUTH:  A/WERBOS,  F  iUL  0.  CORP:  National  Science  Foundation. 

Washington,  JC.  In  Jet  Propulsion  tab..  California  Inst, 
of  Tech.,  Proceedings  of  the  3rd  Annual  Conference  on 
Aerospace  Computational  Control.  Volume  2  p  9i6  (SEE 
N90-23040  16-61) 

A6S  Many  lobbyists  in  Washington  nave  argued  that  artificial 
intelligence  (At)  is  an  alternative  to  manned  space 
activity  In  actuality,  this  is  the  opposite  of  the  truth, 
especially  as  regards  artificial  neural  networks  (ANNs), 
that  form  of  AI  which  has  the  greatest  hope  of  mimicking 
human  abilities  in  learning,  ability  to  interface  with 
sensors  and  actuators,  flexibility  and  balanced  Judgement. 
ANNs  and  their  relation  to  expert  systems  (the  more 
traditional  form  of  AI).  and  the  limitations  of  both 
technologies  are  briefly  reviewed.  A  Few  highlights  of 
recent  work  on  ANNs.  including  an  NSF-sponsored  workshop 
on  ANNs  for  control  applications  are  given.  Current 
thinking  on  ANNs  for  use  in  certain  key  areas  (the 
National  Aerospace  Plane,  teteoperat ion,  the  control  of 
large  structures,  fault  diagnostics,  and  docking)  which 
may  be  crucial  to  the  long  terra  future  of  man  in  space  is 
discussed.  69/tr/15  90N23088 


(•.g;.  turbopump  blades);  Tha  ganarality  of  tha  approach 
Ma'such  that  load/damaga  mappings  can  bs  diractly 
axtraetsd  from  axparimantal  data  without  rai^iring  any 
knowledge  of  tha  strass/straih  prof  1 le  of  the  compoi^nt. 

Xn  addition,  the  parallat  network  architactura  allows 
raal'tima  life  calculations  Nvsn  for  high  fraquancy 
vibrations.  Owing  to,  its  distributad  nature',  the  neural 
inplamentation  will  be  robust  and  reliable,  enabling  its 
usa  in  Hostile  environments  such  as  rocket  engines.  This 
neural  net  estimator  of  fatigue  life  is  seen  as  the 
enabi  ing'technology  to  achieve  cdinponent  life 'prognosis, 
end-therefore^would  be  an  imporfant  part* of' Hfa  axtendij;ig 
control  for  reusable  rocket  enginae. 

RPT#:  NASA-TM-103H7  E-B217  NAS  1. 15: 103117  90/00/00 

90N2i984 


UTTL:  Naural  networks  for  aircraft  control 

AUTH:  A/LINSE.  DENNIS  CORP:  Princeton  Univ. .  NJ.  CSS:  (Oapt. 
of  Mechanical  and  Aerospace  Enginaarihg.)  tn  NASA, 
Langley  Research  Canter.  Joint  University  Program  for  Air 
Transportation  Rasaarch,  1988-1989  p  167-181  (SEE 
N90-20921  14-01) 

ABS:  Currant  research  in  Artificial' Naural  Networks  indicates 
tHat  networks  offer  some  potential  advantages  In 
sdaptstfon  and  fault  tolerance:  This  research  is  directed 
at  datarmining  the  possible  applicability  of  neural 
networks  to  aircraft  control.' The  first' appl ication  will 
be  to  aircraft  trim.  Neural  network  node  characteristics, 
network  topology  and  operation,  naural  network  laarning 
and  example  hietorias  using  neighboring  optimal  control 
with  a  naural  net  are  discussed.  90/03/00  90N20937 


UTTL:  Computation  and  control  with  naural  nate 

AUTH:  A/CORNECXUSEN,  A.;  B/TEROAL,  P. ;  C/KNXGHT,  T.; 

0/SPENCER.  J.  CORP:  Stanford  Linear  Accelerator  Canter, 
CA.  Presented  at  the. Xntarnational  Confaranca  on 
Accelerator  and  Large  Experimental  Physics  Control 
Systems.  Vancouver.  British  Columbia.  30  Oct.  -  3  Nov. 

1989 

ABS.  As  anargiss  have  increased  axponahtial ly  with  time  so  have 
tha  size  and  complexity  of  accelerators  and  control 
systems.  Naural  nets  (NN)  may  offer  tha  kinds  of 
improvements  in  computatiohand  control  that  are  naabed  to 
maintain  acceptable  functionality.  For  control  their 
associative  characteristics  could  provide  signal 
conversion  or  date  translation.  Because  they  can  do  any 
confutation  such  as  least  squares,  they  can  close  feedback 
loops  autonomously  to  provide  intelligent  control  at  the 
point  of  action  rather  than  at  a  central  location  that 
raqulras  transfers,  conversions,  hand-shaking  end  other 
costly  repetitions  like  input  protection.  Both  computation 
and  control  can  be  integrated  on  a  single  chip,  printed 
circuit  or  an  optical  equivalent  that  is  also  inharsntly 
faster  through  full  parallel  operation.  For  such  reasons 
one  expects  lower  costs  and  better  rasults.  Such  systems 
could  be  optimized  by  integrating  sensor  and  signal 
processing  functions.  Distributed  nets  of  such  hardware 
could  communicate  and  provide  global  monitoring  and 
multiprocessing  in  various  ways  e.g.,  via  token,  slotted 
or  parallel  rings  (or  Steiner  tiees)  for  compatibility 
with  existing  systems.  Problems  and  advantages  of  this 
approach  such  as  an  optimal,  real-time  Turing  machine  are 
discussed.  Simple  examples  are  simulated  and  hardware 
ifflplemantad  using  discrete  elements. 

RPT#:  OE90-006460  $U-SLAC-PUB-5035  CONF-891094-14  89/10/00 

90N1B911 
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UTTL:  ALVINN.  An  Autonomous  Land  Vehicle  In  a  Neural 
Nettwork 

AUTH;  A/POMERLEAU,  DEAN  A.  CORP:  Carnegie-Mel Ion  Uni v. . 

Pittsburgh,  PA.;  PI ttsburgh  Univ. ,  PA.  CSS:  (Artificial 
Intelligence  and  Psychology  Project.)  Presented.at  the 
IEEE  Conference  oh  Neural  Information  Processing  Systems 
Natural  and  Synthetic.  Denver.  CO.  Nov.  1968 

ABS  ALVINN  (Autonomous  Land  Vehicle  In  a  Neural  Network)  is  an 
3  layer  back  propagation  network  designed  for  the  task  of 
road  following.  Currently  ALVINN  takes  images  from  a 
camera  and  a  laser  range  finder  as  irv>ut  and  produces  as 
output  the  direction  the  vehicle  should  travel  in  order  to 
follow  the  road.  Training  was  conducted  using  simulated 
road  Images.  Successful  tests  on  the  Carnegie  Mellon 
autonomous  navigation  test  vehicle  indicate  that  the 
network  can  follow  real  roads  under  certain  field 
conditions.  The  representation  developed  to  perform  the 
task  differs  greatly  when  the  hi  twork  is  trained  under 
various  conditions,  suggesting  the  possibility  of  a  novel 
adaptive  autonomous  navigation  system  capabla  of  tailoring 
its  processing  to  the  conditions  at  hand. 

RPT#;  A0-A2i8975  AlP-77  89/01/00  90N22797 


UTTL*  A  real  time  neural  net  estimator  of  fatigue  life 
AUTH-  A/TROUDET,  T. :  B/MERRILL,  W.  PAA.  A/(Sverdrup 
Technology.  Inc.,  Cleveland,  OH.)  CORP:  National 
Aeronautics  and  Space  Administration.  Lewis  Research 
Center.  Cleveland.  OH.  Presehtad  at  .the  International 
Joint  Conference  on  Neural .Networks,  Sah  Otego.  CA.  17-21 
Jun.  1990;  cosponsored  by  IEEE  and  INNS 
ABS:  A  neural  net  architecture  is  proposed  to  estimate,  in 

real-time,  the  fati^e  life  of  mechanical  components,  as 
part  of  the  Intelligent  Control  Syatsm  fdr'Reusable  Rocket 
Engines.  .Arbitrary  cof^neht  loading  values  were  used  as 
input  to  train  a  two  hidden- layer  feedforward  neural  net 
to  estimate  component  fat igue  damage.  The'^abitity  of  the 
net  to  learn,  based  on  a  local  strain  approach,  the 
mapping  between  load  sequence  ahd'fati^e  da:ug#  has  bee.n 
demonstrated  for  a'  uniaxial 'specimen.  Because  of  its 
demonstrated  performance,  the  neural  computation  may  be 
extended  to  complex  cases  where  the  1 cede  are  biaxial  dr 
triaxial.  and  the  geometry  of  the  component  is  complex 


UTTL:  Neurobeamformer  2*  Further  exploration  of  adaptive 
beamforming  via  neural  networks 

AUTH:  A/SPEIOEL,  S.  L.  CORP:  Naval  Ocean  Systems  Center,  San 
Diego.  CA.  CSS;  (Analysis  Branch.) 

ABS.  This  paper  discussed  neural  network  technology  as  a  tool 

for  signal  processing.  Test  results  show  that  the  adaptive 
beamformer  method,  based  on  neural  network  technology, 
performs  the  desired  function  of  directing  a  beam  so  as  to 
enhance  a  target  signal  and  reject  noise  and  interference. 
Comparing  test  output  values  with  a  matched-correlation 
output  shows  that  the  plotted  crossbar  circuit  energy 
minima  follow  the  shape  of  an  Inverted  match-filter 
output.  The  neurobeamformer  has  certain  advantages  of 
implementation  and  adaptability  oyer  other  methoda.  In 
concept,  it  is  implementable  in  analog  circuitry  with  no 
control  code  required.  Thus  a  compact,  simple,  low-cost 
processor  compphent  that  is  not  sensitive  to  array 
grooming  can  be  produced.  A  straightforward  adaptive 
beamformer  cannot  match  the  interference-cancel lation 
performance  of  m')re  exotic  methods,  which  include  tidelobe 
cancellers.  So.  a  neuroprdeessor.  that  will  include  a 
neurobeamformer  ae  a  component,  will  be  built.  This 
neuroprocessor  will  provide  for  cancellation  of  sidelobes, 
enhance  source  discrimination  and  angle-astimation  through 
interaction  of  beams.  Plans  for  this  axtended  network  were 
influenced  by  studies  of  the  literature  in  biological 
sensory  processing,  both  psriphsrsi  shd  cshtral . 

RPT#;  AO-A215118  NOSC/TO-1606  89/06/00  90N18226 


'UTTL;  Knowisdgs-bassd  imsging-stnsor  fusion  systsm 

AUTH.  A/WESTROM,  GEORGE  CORP:  Odsties.  Inc: .  AhShSiffi.  CA.  In 
NASA,.  Lsnglsy  Rssssrch  Centsr,.  Visual  Information 
Procassing  for  Tslevision  and  Tslsrobotics  p  215-229  (SEE 
N90M6204  08-35) 

ABS:  An  Imaging  systsm  which  appi ias  knowiadge-baasd  tachnology 
to  suparyisa  and  control  both  aansor  hardware  and 
computation  in  the  imaging  ayatam' is  deaeribad.  It 
Includes  the  development  of  an  imaging  ayatam  breadboard 
which  brings  togathsr  into_dne  ayatam  work  that  we  and 
others  hava  pursuad  for  LaRC  for  aevaral  ysara.  The  goal 
ie  to  combine  Digital  Signal  Prbceaaing  (DSP)  with 


3 


’I 


of  tactical  tar^.ta.. using  a  naw  bloldgicsily-bastd  naurat 
natWork.  Tl^  tarots  of  intarast  wara.^oaratad  from 
Dollar  iina^ry  and  foi^ward  looklng/lnfrarad  imagary.  and 
conslatad.df.  tankti^'trucks,  armoradtpahsonna)  carnlars. 
Ja^s' ai^^patrolaum..  oil ,  and  lubricant  tankars.  Each 
.targat>.was.dascr1bad'by.  faatlira..vactora.  such  as 
hornal  Ixad.ffloiiant  ihvartahti. 'Tha  faaturas  wara  gaharatad 
froa  tha  ima^ry.usihg  V  sa^antlhg  procaasT  Thasa  faatura 
vaetors^wara  utadas  tha. Input  to.ahaural/hatuork 
-  clasBif lar  for  t^ctlcal^targat  racoghl.tlon,  Tha  haural 
hatwork  conslstad  of  ajaultllayar  parcaptroh  architactura. 
amp'loylngra  backward /arf or  prbpa^tlon.  lairnlng  algorlthn. 
Tha  mihimfxatlon  tachnlqua  utad  was^-anjipproxiiffiatlon  to 
Nawtph's' Mthod:  Thlslsacbnd  ordar^at^rlthm 'is  a 
gaharallzab  varsibn  of  wall. known  ftrst  orbar  tachniquas, 
i.ai.  grablant-'o/..  staapast  dascaht.ahb  nomahtum  niathods. 
Classification  using  both. first. ahB  sacohb  ordar 
tachnlquas  was  parforiiad.  with  comparisons  drawn.-. 

RPf#:  AD*A262666  AFlf/QE/ENQ/886>36  88/12/00  89N21644 


UTTL:  Automatic  voica  racognltibh.uslrg  traditional  and 
artif  Iclalr^haural  hatwork  approachas 

AUTH:  A/B0Tft05.  NAZElH  M.  CORP:  Uhlvarsity  of  Southern 
Illinois.  Carbondala.  CSS:  (Oapt.  of  Electrical 
EhglnaaHhg.)  In  NASA.  Lyndon  B«  Johnson  Space  Center, 
National  Aeronautics  and  Space  Administration 
(NASA)/Affiar1cah.S<^1aty  for  Engineering  Education  (ASEE) 
Summer  Faculty  Fellowship  Program  1988/  Volume  1  13  p  (SEE 
N89-20658  12-99) 

A6S:  The  main  objective  of  this. research  is  to  develop  an 

algorithm  for  Isolated-word  recognition.  This  research  is 
focused, oh  digital  signal  analysis  rather  than  linguistic 
analysis  of  speech.  Features  extractfbh. is  carried  out  by 
applying  a^LlnaarPredlctlve  Coding  (IPC)  algorlthn  with 
order  of  Id.  Continuous-word  and  speaker  Independent 
recognition  will  be  considered  ih  future  study  after 
accomplishing  this  'solatad  wordVasearch.  To  examine  the 
s'imllarlty, between  the.  rafarahca  end  the  training  sets, 
two  approaches  are  explored.  The  first  is  Implementing 
traditional  pattern  recoghltldh  techniques  where  a  dynamic 
tima  warping  algorithm  is  applied  to  align  the  two  sets 
and  caleulata  tha  probability  of  matching  by  measuring  the 
Euclidean  distance  between  the  two  sets.  The  second  Is 
Implementing  a  backpropagatlon  artificial  neural  net  model 
with  three  layers  as  the  pattern  classifier.  The 
adaptation  rule  implemented  ih  this  network  is  the 
generalized  least  mean  square  (LMS)  rule.  The  first 
approach  has  bash  accomplished.  A  vocabulary  of  50  words 
was  selected  and  tasted.  Tna  accuracy  of  tha  algorithm  was 
found  to  be  around  85  percent.  The  second  approach  is  in 
progress  et  the  present  time.  89/02/00  89N20064 


Knowledge-Based  Processing  end  also  Include  Neural  Net 
processing.  The/system  is . considered  a  smart  camera. 
IfUgjne  that  there  Is  a;«tcr6gray1tV  expeh,rment  oh-board 
Space  St'atidh  Freedbm^with  a  high  ffame'rate,  high 
resolut  ion  camera;  lAI  I  the  data  «hhot  . possibly  be 
acquired  from  a -laboratory  oh, Earth.  In-fact, .Ohly  a  small 
fraction  bf'the  data  will  be  received.  Again.-  imagine 
being  respoheible  for  some  experlmenta.on  Mars  with  the 
Mars  Rover:  the  date  .rate  is  a  few  kl^bhlts^per  second  for 
date  .from  several  tahsors  and  Ihstrunents.,. Would  it  not  be 
preferable  to  have.a  smart  system  which  would  have. some 
human  knowledge  end  yet  fol low, some  instructions  and 
attempt  to  meke  the  best  use^of.  the  limited  bandwidth  for 
transmission.  The  system  concept,  current  status  of  the 
breadboard  system  and /some  recent  ekperimehte  at  the 
Mars-like  Amboy  Lava  nelds  In  California  are  discussed. 
89/11/00  90N16220 

UTTL:  Reel -time  support  for  high  perfor^hce  aircraft 
operation 

AUTH:  A/VIOAL,  JACQUES  J.  CORP:  California  Urilv..  Los  Angeles. 
CSS:  (Oept.  of  Computer  Science.)' 

ABS:  The  feasibility  of  real-time  processing  schemes  using 
artificial  naural  networks  (A1<Ms)  is  Investigated.  A 
rationale  for  digital  neural  nets  la  presented  and  a 
general  processor  architecture  for  control  applications  is 
illustrated.  Research  results  oh  ANN  structures  for 
real-time  applications  are  given.  Research  results  on  ANN 
algorithms  for  real-time  control  are  also  shown. 

RPT#:  NASA-CR- 185475  NAS  1.26:185475  89/01/00  .90N10075 


UTTL:  Integration  of  parceptlon  and  reasoning  in  fast 
neural  modules 

AUTH‘  A/FRITZ.  OAVXO  Q.  CORP:  George  Washington  Univ., 

Washington.  OC.:,  Cognitive. Information  Systems  Co., 

Silver  Spring.  MO,  CSS:  (Inst,  for  Artificial 
Intel  1 Igence. )  In  NASA  Goddard  Space  Flight  Center,  Tne 
1989  Goddard  Conference  on  Space  Applications  of 
Artificial  Intelligence  p  349-3S6  (SEE  N89-26S78  20-63) 

ABS.  Artificial  neural  systems  promise  to  integrate  symbolic 
and  sub-svmoollc  process irig.  to  achieve  real  time  control 
of  physical  aystems.  Two  potential  alternatives  exist.  In 
one,  neural  nets  can  be  used  to  front-end  expert  systems. 
The  expert  systems,  in  turn,  are  developed  with  varying 
degraes  of  parallelism,  including  their  implementation  in 
neural  nets.  In  the  other,  rule-based  reasoning  and  sensor 
data  can  be  integrated  within  a  single  hybrid  neural 
system.  The  hybrid  system  reacts  as  a  unit  to  provide 
decisions  (problem  solutions)  based  on  the  simultaneous 
evaluation  of  data  and  rules.  Discussed  here  is  a  model 
hybrid  system  based  on  the  fuzzy  cognitive  hap  (FCM).  The 
operation  of  the  model  is  Illustrated  with  the  control  of 
a  hypotheticat  satellite  that  intelligently  alters  its 
attitude  in  space  in  response  to  an  intersecting 
ntcrometeorlte  shower.  69/04/00  69N26603 


UTTL  Empirical  analysis  and  refinement  of  expert  system 
knowledge  bases 

AUTH.  A/WEISS.  SHOLOMM..  8/KULlKOWSKl ,  CASIMIRA.  CORP- 
Rutgers  -  The  State  Unlv.._New  Brunswick,  NJ.  CSS:  ( 
Center  for  Expert  Systems  Research. ) 

ABS.  Classif ication  methods  from  statistical  pattern 

recognition,  neural  nets,  and  machine  learning  were 
applied  to  four  real -world  data  sets.  Each  of  these  data 
sets  has  been  previously  analyzed  and  reported  ih  the 
statistical,  medical,  or  machine  learning  literature.  The 
data  sets  are  characterized  by  statistical  uncertainty; 
there  is  no  completely  accurate  solution  to  these 
problems.  Training  and  testing  or  resampling  techniques 
are  used  to  estimate  the, true  error  rates  of 
classif ication  methods.  Detailed  attention  is  given  to  the 
analysis  of  performance  of  the  neural  nets  using  back 
propagation.  For  these  problems,  which  have  relatively  few 
hypotheses  and  features,  the  machine  learning  procedures 
for  role  induction  or  tree  induction  clearly  performed 
best. 

RPT#  AD-A206226  89/02/28  89N248S8 


UTTL.  Neuromorphic  learning  of  cont inuous-valued  mappings 
In  the  presence  of  noise:  Application  to  r)al-tlme 
adaptive  control 

AUTH;  A/TROUOET.  TERRY;  B/MERRILL.  WALTER  C.  PAA:  A/(Sverdrup 
Technology.  Inc..  Cleveland.  OH.)  CORP:  National 
Aeronautics  and  Space  Administration.  Lewis  Research 
Center.  Cleveland.  OH.  Presented  at  the  International 
Conference  oh  Neural  Networks,.  Washington.  6c,  18-22  Jun. 
1989;  sponsored  by  the  IEEE 

ABS.  The  ability  of  feed-forward  neural  net  architectures  to 
learn  continuous-valued  mappings  in  the  presence  of  noise 
Is  demonstrated  In  relation  to  pArameter  idehtif ication 
and  real-time. adaptive  control  appi Icatlons.  Factors  and 
parMeters  influencing  the  learning  performance  of  such 
nets  in  the  presence  of  noise  are  identified.  Their 
effects- are  discussed  through  a. computer  simulation  of  the 
Back-Error-Propagat Ion  algorithm  by  taking  the  example  of 
the  cart-pole  system  controlled  by  a  nonlinear  control 
law.  Ade^ate  sampl  ing  of  the  state  space  is  found  to  be 
essential  for  canceling  the  effect  of  the  statistical 
fluctuations  and  allowing  learnlhglto  take  p lacs. 

fiPT/T:  NASA-TMH01999  E-4706  NAS  1.15,  i01999  89/00/00 

69N248S6 


UTfL:.M6dlHsd  backward  error  propagation  for  tactical 
targetrrecoghj t Ion 

AUTH:  A/PIAZZA,  CHARLES  ,C.  ^CORP:  Air  Force  Ihst.  of  Tech.. 

WrightrPetterson- AF6,  OH,  CSS:  (School  of  Engineering. ) 
ABS:  This  thesis  explores  a.  new  approach  to  the  classif ication 


UTTL:  Simulation  tests  of  the  optimization  method  of 
Hopfield  and  Tank  using  neural  networks 

AUTH:  A/PAIELLI.  RUSSELL  A.  CORP:  National  Aeronautics  and 

Space  Administration.  Ames  Research  Center,  Moffett  Field, 
CA. 

ABS:  The  method  proposed  by  Hopfield  and  Tank  for  using  the 

Hopfield  neural  network  with  continuous  valued  neurons  to 
solve  the  traveling  salesman  problem  is  tested  by 
simulation.  Several  researchers  have  apparently  been 
unable  to  successfully  repeat  the  numerical  simulation 
documented  by  Hopfield  and  Tank.  However,  as  suggested  to 
the  author  by  Adams,  it  appears  that  the  reason  for  those 
difficulties  is  that  a  key  parameter  value  is  reported 
erroneously  (by  four  orders  of  magnitude)  in  the  original 
paper  When  a  reasonable  value  is  used  for  that  parameter, 
the  network  performs  generally  as  claimed.  Additionally,  a 
new  methou  of  using  feedback  to  control  the  input  bias 
currents  to  the  amplifiers  is  proposed  and  successfully 
tested.  This  eliminates  the  need  to  set  the  input  currents 
by  trial  and  error. 

RPT#:  NASA-TM- 101047  A-88275  NAS  1.15.101047  88/11/00 

89N 14004 


UTTL:  Genetic  algorithms  for  adaptive  real-time  control  In 
space  systems 

AUTH;  A/VANDERZIJP,  J.;  B/CHOUDRY.  A.  CORP:  Alabama  Univ. . 

ituntfville.  CSS.  (Center  for.  Appt  led  Opt  ics. }  In  NASA, 
Marshall  Space  Flight  Center,  third  Conference  on 
Artificial  Intelligence  for  Space  Applications.  Part  2  p 
47-51  (SEE  N88-24t8e  17-61) 

ABS:  Genetic  Algorithms  that  are  used  for  learning  as  one  way 

to  control  the  combinational  explosion  associated  with  the 
generation  of  new  rules  are  discussed.  The  Genetic 
Algorithm  approach  tends  to  work  best  when  It  can  be 
applied  to  a  domain  independent  knowledge  representation. 
Applications  to  real  time  control  in  space  systems  are 
discussed.  88/06/00  88N24195 


UTTL.  Third  Conference  on. Artificial  Intelligence  for 
Space _Appl  icat ions,  part  2 

AUTH:  A/OENTON.  0UDIT8  S. :  B/FREEMAN.  MICHAEL  S. ;  C/VEREEN. 
Mary  CORP:  NAtlonpI  Aaronautics  and  Space 
Admintatratloh.  Maranall  Spacr , Flight  Cantar.  Huntavtlla. 
aL.  Confaranca  hald  ih  Huntavilla,  Ala\ .  R'B  Nov.  1987; 
apdnsdrad  by' NASA . .  Marsha 1 1  Spaca  F 1 1 ght  Cahtar . 
Huhtsvtila,  Ala-  and  Alabama.Upiy. ,'Huhtavttla  ANN: 
tbpica  ralatlva  to,  tha  application  bF\irtl,Ficlal 
Ihtalligahca.'.to  apaca  bparatibha  ara  dlicutaad.  Nan 
tachhdiogl^as,Fbr  apacajitatibn  autonatibn,  daaigh  data 
captura,  ,cpflipwtar“ vlaion.  n^ral -nata,' /aut^atic 
prbgraijioiihg'i  and  rail  tima  i^1  fca’tiont  ira"discuisod. 

For  ihdlvldual' tltlaii  iaa  N88-aA 189’ through’ N8e-24 197. 

RPf»:  NASA-CP-2A92-PT-a  M-576;PT-2  NAS  1.SS:2492:PT-2  88/06/00 

88N24teS 
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UTTLi  M«BOPy  •v«lu«t)oni  of  nontinoar  atoenaatfc 

aquattons  and  C3  appi tcatidns. 

A/CONNELL._UOHN  C,,.  JR.  CORR-  Naval  Poitgraduata  School i 
MontaNy.  CA.  .  .  -  ,  . 

Tha  Statlatical  Machanical  Naural  Computar  (SMNC) 
davalopad  In .thiaj’thaala  utntzaa  a  Staf latleal^Nacnanlcal 
Ndhllhaar^Al^rlthmUSMNA)  to  datarmlna  tha  long-tima 
probabll Ity 'dlatplb'utlon  of  highly  nonllnaap  atoehaatic 
ayataha.  fhaiuaa  dt'''t^\SHNA  and  a  hdval  liaadacoplc 
^acallng  .tachnlqua.’halp'Vpovlda  :tha' SMNC' Wlthj.tha" 
capabtl itlaa  of  haupal  computapa  without  tha  dpawbacka  of 
nuga  connactlon  matpleaa  and  thalp.attandant  conputatidnal 
paqufpaSanta.  Ihithla  thaala.  ;tha  SMNC  la  Inltlallyi'uaad 
to  vaPlfy  tha  ability  of  tha?SMNA\toauplleata  patatlualy 
aliipla.  aihgla  vahlabla  path  Intagpal  aolutldha  to- 
nonllnaap  Fokkap-Planck  aquatidna.  Aftap  tha  fundamantal 
algoplthma-apavaildatad,  tha  SMNC'a  ability  to.almulata  a 
two-vaplabia,  nultlca'l  liilap  ppoblamiby  nodal  lng;a  poPt!on* 
of  tha  ‘nadcohtax  cohalatlng  of  100, 000  naupal,  unita  Ic 
dlacuaaad.  Thapa  apa  many  Impoptaht  application!  of  tha 
SMNC  and  It!  uniqua  SMNA  to  C3  ayatama-  Including  padapj 
aonap  and  alactponic  s'gnala  ppdcaaaing-.  miaalla-guldanca 
ayatama  and  an  Intagpatad  battla  mahagamant  aystam.  Such 
C3  ayatama  will  banafit  fpom  tna  SMNC'a  potantial  to 
afficlantly  flltap  lapga  amount!  of  data,  pacogniza 
pattapna  and  anticipata,  with  aona  dagpaa  of  uncaptainty, 
tha  futupa  atata  of  highly  nonllnaap  atochaatlo  ayatama. 
AO-A 189872  87/12/00  '  88N22S69 


UTTL;  Poaltloh,  acala,  and  potation  Invaplant  tapgat 
pacognitlon  ualng  pang#  tmagapy 

A/tROXEL.  STEVEN  .E.:  -CORP:  AlP  Fopca  Inat.  of  Tach. . 
Wplght-Pattapadn  APB.  OH.  CSS:  (School  of  Enginaaping. ) 
Thia  thaala  axplopaa  a  haw  apppoach  to  tha  pacognitlon  of 
tactical  tapgata  ualhg  a  multifunction  laaop  padap  aanaop. 
Tapgat!  of  Intapaat  wapa  tanka.  Jaapa,  and  tpucka.  Dopplap 
Imagaa  wapa  aagmantad  and  ovaptaldad  onto  a  palativa  pahga 
Imaga.  Tna  paaultant  ahapaa  wapa  than  tPanafofmad  into  a 
poaltlon.  acala.  and  potation  Invaplant  (PSRIi  faatupa 
apaca.  Tha  claaalf Icatidn  ppocaasaa  uaad  tha  coppalatlon 
paak  of  tha  tamplata  PSRt  apaca  and  tha  tapgat  PSRl  apaca 
aa  faatupaa.  Two  claaalf Icatlon  mathoda  wapa  tmplamanted. 
a  claaalcal  diatahca  maaaupamant  apppoach  and  a  naw 
biological ly-baaad  naural  natwork  multilayar  parcaptlon 
apchitactupa.  Both  mathoda  damonatpatad  claaalf icatlon 
rataa  naar  100  papcant  with  a  tpua  rotation  InvaPlanca 
damonatpatad  op  to  20  dagpaaa.  Naural  natwopka  warm  ahown 
to  nova  a  dlatlhet  advantaga  In  a  robuat  anvironmant  and 
whan  a  figura  of  marlt  erltapla  waa  appllad,  A  apaca 
domain  coppalatlon  waa  davalopad  ualng  local  hormallzatlon 
and  multiataga  procaaalng  to  Ideat'a  and  claaaify  targata 
in  high  cluttar  and  with  partially  occludad  targata. 
A0-AI88828  AFlT/QeO/ENG/870-3  87/12/00  8eNI9772 

UTTL;  Automatad  radar  tapgat  pacognitlon  baaad  on  modal! 
of  naural  nata 

A/MIYAHARA.  SHUNJI  CORP:  Pannaylvanla  Unlv,. 
Phlladalphla. 

Two  mathoda  of  targat  pacognitlon  ara  propoaad;  (I)  tha 
uaa  of  alnogram  papraaantat lona  aa  laarnlng  aat  In 
aaaoclativa  mamopy,  baaad  on  raodala  of  naural  nata  aa 
clcaalflaPi  and  (2)  uaa  of  polar  zatlon  papraaantat Ion  for 
uaa  In  naural  nat  baaad  aaaoclativa  mamory  aa  a 
claaatflar.  Ualng  mlcrowava  acattarlng  data  of  acalad 
modal  targata.  tha  concapta  for  tha  targat  pacognitlon 
wara  damonatpatad  by  computar  almulatlon  of  a  1024  (32  by 
32)  alamant  naural  nat  aaaoclativa  mamopy  baaad  on  tha 
outar  product  modal.  Tha  almulatlona  ahow  that  partial 
Input,  conalatlng  of  laaa  than  10  parcant  of  tha  total 
information,  can  Identify  tha  targata.  Two-dimanalonal 
optical  implamantatlona  of  a  naural  nat  of  8  by  8  binary 
naupona  wara  atudlad.  fault  tolaranca  and  robuatnaaa  wara 
axamlnad.  ualng  a  four-dimanalonal  cllppad  outar  product 


tarnary -TIJkl  maak  to  astabl lah  tha  walghtad 
Inturconnicttons  of  th#  and  •lactronlc  faadback  baaad 
^6n  'cloaws  -loop  TV  ayatama.  Tha  parformanca  waa  found  to  ba 
in  agraamant  with  that  of  computar  almulation.  avan  though 
abarrati6n'*bf  lanaaa'and  tha  dafacta  of  tha  ayatam  wara 
,  ypraaant;  iThaaa  raaulta  confirm  tha  practical  .aultablHty 
of  tha .opto-albctrohlc  approach  to  tha  naural  nat 
Implamantatlbn  and  pava  tha  way  for  tha  implamantat Ion  of 
largar  natworka.  87/00/00  88N18804 

UTtL:  Teaching  artificial  naural  ayatama  to  drlva;  Manual 
trainihgtachniquaa  for  autonomoua  ayatama 
AUTH:  A/SHEPANSKriVa  F.:  B/MACY'i  S.  A.  CORP;  TRW,  Inc., 
Radondc  Baach.  CA.  In  NASA.  Lyndon  B.  Johnaon  Space 
Canter?;  Houatbn,  Taxaa";  FJrat  Annual  Workshop,  on  Space 
bpe'ratlona  Automation  and  Robotlca  (SOAR  87)  p  231-238 
-  ^  5(SEi>N88> 17208  0«?5f)  ; 

ABS:  A  mathodbjbgy-_waa-developad  for  manually , training 

autonomoua  control  ayatama  baaed  on  or*1flcial  neural 
aya^ma'(ANS).  In  appllcatlona  where  tre  rule  aet 
governing, ah  expert'a  declalona  la  difficult  to  formulate, 
ANS  can.ba;uaad.to  extract  rulea  by  aaaoclating  the 
Informatlbh  an  expert  racalVaa  with  tha  actlona  taken. 
Properly  conatructad  networks  Imitate  rulea  of  behavior 
that  permtta  them  to. function  autonomously  whan  they  ara 
trained  on- the  spanning  aat  of  possible  situations.  This 
training  can  ba'provlded  mahually,  either  under  the  direct 
supervision  of  a. system  trainer,  or  indirectly  using  a 
background  mode  where  the  networks  assimilates  training 
data  as  the  expert  performs  its  day-to-day  tasks.  To 
demonstrate  these  methods,  an  ANS  natwork  was  trained  to 
drive  a  vehicle  through  simurateb  freeway  traffic. 

97/10/00  88N17238 


UTTL;  NASA  JSC  neural  network  survey  results 
AUTH:  A/GREENWOOD,  DAN  CORP.  Netrologlc.  Ihc.,  San  Diego.  CA. 
In  NASA'.  Lyndon  B.  Johnaon.Space  Center.  Houston.  Texas, 
First  Annual  Workshop  on  Space, Operations, Automat  ion  and 
Robotlca  (SOAR  87)  p  97-110  (SEE  N89-17206  09-59) 

ABS:  A  survey  of  Artificial  Neural  Systems  In  support  of  NASA's 
(Johnson  Space  Center)  Automatic  Perception  for  Mission 
Planning  and  Flight  Control  Research  Program  was 
conducted.  Several  of  tha  world's  leading  researchers 
contributed  papers  containing  their  most  recent  results  on 
artificial  naural  systems.  These  papers  were  broken  into 
categories  and  descriptive  accounts  of  the  results  make  up 
a  large  part  of  this  report:  Also  included  is  material  on 
sources  of  information  on  artificial  neural  systems  such 
as  books,  technical  reports,,  software  tools,  etc. 

67/10/00  88N17230 


UTTL*  Models  of  tha  vattlbular  aystam  and  postural  control 

AUTH;  A/VOUNG,  L.  R.;  B/WEISS.  A.  PAA.  B/(Ma8S.  Eye  and  Ear 
Infirmary)  CORP*  Massachusstts  Inst,  of  Tech.. 

Cambrldgs.  In  NASA.  Amss  Res.  Csntar  Tachnol .  and  tha 
Neurological ly  Handicapped  p  151-168  (SEE  N75- 19975 

ABS.  Applications  of  control  theory  and  systems  analysis  to  tha 
problem  of  orientation  and  posture  control  ara  discussed, 
with  tha  poaslbla  long  range  goala  of  contributing  to  tha 
development  of  hardware  for  rehabilitation  of  tha 
handicapped.  74/00/00  75N19992 


UTTL:  Tha  brain  as  a  modal  for  LSI 
AUTH:  A/ALBUS,  J.  S.  CORP:  National  Aeronautics  and  Space 
Administration.  Goddard  Sp^ca  Flight  Canter,  Graanbalt, 
MO.  IN  ITS  SIGNIFICANT  ACCOMPLISHMENTS  IN  SCI.-  AND 
TECHNOL.  AT  GODDARD  SPACE  FLIGHT  CENTER  ,1970  P  292-294 
/SEE  N71-25256  13-34/  70/00/00  71N25326 
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